ML Modelos Supervisionados Clássicos
Projeto Final (2) - House Prices
Preparação Inicial
Definir diretório e pacotes necessários.
load.pks = c(
"readr",
"ggplot2",
"plotly",
"e1071",
"dplyr",
"tidyr",
"nortest",
"gridExtra",
"kableExtra",
"visdat",
"stringr",
"mlbench",
"caret",
"GGally",
"RColorBrewer",
"ROCR",
"pROC",
"rmarkdown",
"qqplotr",
"reshape2",
"ggfortify",
"car",
"vip",
"MASS"
)
lapply(load.pks, require, character.only = TRUE)Entendimento do Negócio [Business Understanding]
Objetivo
O objetivo é prever o perço final de uma casa utilizando o banco de
dados que contem uma lista de casas em Ames, Iowa. Esse banco de dados
conta com 79 variáveis explicativas que ajudam a determinar o preço das
casas.
Entendimento dos Dados [Data Understanding]
Descrição das Variáveis
train.csv - Registros de cerca de metade (~1460) das casas.
Id - Um ID exclusivo para cada casa.
MSSubClass - Identifica o tipo de habitação.
- 20: 1-STORY 1946 & NEWER ALL STYLES
- 30: 1-STORY 1945 & OLDER
- 40: 1-STORY W/FINISHED ATTIC ALL AGES
- 45: 1-1/2 STORY - UNFINISHED ALL AGES
- 50: 1-1/2 STORY FINISHED ALL AGES
- 60: 2-STORY 1946 & NEWER
- 70: 2-STORY 1945 & OLDER
- 75: 2-1/2 STORY ALL AGES
- 80: SPLIT OR MULTI-LEVEL
- 85: SPLIT FOYER
- 90: DUPLEX - ALL STYLES AND AGES
- 120: 1-STORY PUD (Planned Unit Development) - 1946 & NEWER
- 150: 1-1/2 STORY PUD - ALL AGES
- 160: 2-STORY PUD - 1946 & NEWER
- 180: PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
- 190: 2 FAMILY CONVERSION - ALL STYLES AND AGES
MSZoning - Identifica a área em que está situado o imóvel.
- A: Agriculture
- C: Commercial
- FV: Floating Village Residential
- I: Industrial
- RH: Residential High Density
- RL: Residential Low Density
- RP: Residential Low Density Park
- RM: Residential Medium Density
LotFrontage - Distância em pés da área da frente da casa até a rua.
LotArea - Tamanho da casa em pés quadrados.
Street - Identifica o tipo de rua que dá acesso à propriedade.
- Grvl: Gravel
- Pave: Paved
Alley - Identifica o tipo de beco/ruela que dá acesso à propriedade.
- Grvl: Gravel
- Pave: Paved
- NA: No alley access
LotShape - Identifica o formato do terreno da propriedade.
- Reg: Regular
- IR1: Slightly irregular
- IR2: Moderately Irregular
- IR3: Irregular
- Reg: Regular
LandContour - Identifica o quao plano é o terreno da propriedade.
- Lvl: Near Flat/Level
- Bnk: Banked - Quick and significant rise from street grade to building
- HLS: Hillside - Significant slope from side to side
- Low: Depression
Utilities - Identifica o tipo de utilidades disponíveis.
- AllPub: All public Utilities (E,G,W,& S)
- NoSewr: Electricity, Gas, and Water (Septic Tank)
- NoSeWa: Electricity and Gas Only
- ELO: Electricity only
LotConfig - Identifica a configuração do terreno.
- Inside: Inside lot
- Corner: Corner lot
- CulDSac: Cul-de-sac
- FR2: Frontage on 2 sides of property
- FR3: Frontage on 3 sides of property
LandSlope - Identifica o declive da propriedade.
- Gtl: Gentle slope
- Mod: Moderate Slope
- Sev: Severe Slope
Neighborhood - Identifica as localidades físicas dentro dos limites da cidade de Ames.
- Blmngtn: Bloomington Heights
- Blueste: Bluestem
- BrDale: Briardale
- BrkSide: Brookside
- ClearCr: Clear Creek
- CollgCr: College Creek
- Crawfor: Crawford
- Edwards: Edwards
- Gilbert: Gilbert
- IDOTRR: Iowa DOT and Rail Road
- MeadowV: Meadow Village
- Mitchel: Mitchell
- Names: North Ames
- NoRidge: Northridge
- NPkVill: Northpark Villa
- NridgHt: Northridge Heights
- NWAmes: Northwest Ames
- OldTown: Old Town
- SWISU: South & West of Iowa State University
- Sawyer: Sawyer
- SawyerW: Sawyer West
- Somerst: Somerset
- StoneBr: Stone Brook
- Timber: Timberland
- Veenker: Veenker
Condition1 - Identifica a proximidade a condições diversas.
- Artery: Adjacent to arterial street
- Feedr: Adjacent to feeder street
- Norm: Normal
- RRNn: Within 200’ of North-South Railroad
- RRAn: Adjacent to North-South Railroad
- PosN: Near positive off-site feature–park, greenbelt, etc.
- PosA: Adjacent to postive off-site feature
- RRNe: Within 200’ of East-West Railroad
- RRAe: Adjacent to East-West Railroad
Condition2 - Identifica a proximidade a condições diversas (se mais de uma está presente).
- Artery: Adjacent to arterial street
- Feedr: Adjacent to feeder street
- Norm: Normal
- RRNn: Within 200’ of North-South Railroad
- RRAn: Adjacent to North-South Railroad
- PosN: Near positive off-site feature–park, greenbelt, etc.
- PosA: Adjacent to postive off-site feature
- RRNe: Within 200’ of East-West Railroad
- RRAe: Adjacent to East-West Railroad
BldgType - Identfica o tipo de habitação.
- 1Fam: Single-family Detached
- 2FmCon: Two-family Conversion; originally built as one-family dwelling
- Duplx: Duplex
- TwnhsE: Townhouse End Unit
- TwnhsI: Townhouse Inside Unit
HouseStyle - Identifica o estilo da habitação.
- 1Story: One story
- 1.5Fin: One and one-half story: 2nd level finished
- 1.5Unf: One and one-half story: 2nd level unfinished
- 2Story: Two story
- 2.5Fin: Two and one-half story: 2nd level finished
- 2.5Unf: Two and one-half story: 2nd level unfinished
- SFoyer: Split Foyer
- SLvl: Split Level
OverallQual - Avaliação geral do material e acabamento da casa.
- 10: Very Excellent
- 9: Excellent
- 8: Very Good
- 7: Good
- 6: Above Average
- 5: Average
- 4: Below Average
- 3: Fair
- 2: Poor
- 1: Very Poor
OverallCond - Avaliação geral da condição da casa.
- 10: Very Excellent
- 9: Excellent
- 8: Very Good
- 7: Good
- 6: Above Average
- 5: Average
- 4: Below Average
- 3: Fair
- 2: Poor
- 1: Very Poor
YearBuilt - Ano em que a casa foi contruída.
YearRemodAdd - Ano em que a casa foi remodelada (será igual a da construção se nenhuma adição ou remodelagem foi feita).
RoofStyle - Identifica o tipo de telhado.
- Flat: Flat
- Gable: Gable
- Gambrel: Gabrel (Barn)
- Hip: Hip
- Mansard: Mansard
- Shed: Shed
RoofMatl - Itentifica o tipo e material do telhado.
- ClyTile: Clay or Tile
- CompShg: Standard (Composite) Shingle
- Membran: Membrane
- Metal: Metal
- Roll: Roll
- Tar&Grv: Gravel & Tar
- WdShake: Wood Shakes
- WdShngl: Wood Shingles
Exterior1st - Identifica o acabamento externo da casa.
- AsbShng: Asbestos Shingles
- AsphShn: Asphalt Shingles
- BrkComm: Brick Common
- BrkFace: Brick Face
- CBlock: Cinder Block
- CemntBd: Cement Board
- HdBoard: Hard Board
- ImStucc: Imitation Stucco
- MetalSd: Metal Siding
- Other: Other
- Plywood: Plywood
- PreCast: PreCast
- Stone: Stone
- Stucco: Stucco
- VinylSd: Vinyl Siding
- Wd Sdng: Wood Siding
- WdShing: Wood Shingles
Exterior2nd - Identifica o acabamento externo da casa (se exixtir mais de um).
- AsbShng: Asbestos Shingles
- AsphShn: Asphalt Shingles
- BrkComm: Brick Common
- BrkFace: Brick Face
- CBlock: Cinder Block
- CemntBd: Cement Board
- HdBoard: Hard Board
- ImStucc: Imitation Stucco
- MetalSd: Metal Siding
- Other: Other
- Plywood: Plywood
- PreCast: PreCast
- Stone: Stone
- Stucco: Stucco
- VinylSd: Vinyl Siding
- Wd Sdng: Wood Siding
- WdShing: Wood Shingles
MasVnrType - Identifica o tipo de alvenaria.
- BrkCmn: Brick Common
- BrkFace: Brick Face
- CBlock: Cinder Block
- None: None
- Stone: Stone
MasVnrArea - Identifica a área em pés quadrados da alvenaria.
ExterQual - Avaliação da qualidade do material do exterior da construção.
- Ex: Excellent
- Gd: Good
- TA: Average/Typical
- Fa: Fair
- Po: Poor
ExterCond - Avaliação da condição atual do material do exterior da contrução.
- Ex: Excellent
- Gd: Good
- TA: Average/Typical
- Fa: Fair
- Po: Poor
Foundation - Identifica o tipo de fundação.
- BrkTil: Brick & Tile
- CBlock: Cinder Block
- PConc: Poured Contrete
- Slab: Slab
- Stone: Stone
- Wood: Wood
BsmtQual - Avaliação da altura do porão.
- Ex: Excellent (100+ inches)
- Gd: Good (90-99 inches)
- TA: Typical (80-89 inches)
- Fa: Fair (70-79 inches)
- Po: Poor (<70 inches
- NA: No Basement
BsmtCond - Avaliação da condição geral do porão.
- Ex: Excellent
- Gd: Good
- TA: Typical - slight dampness allowed
- Fa: Fair - dampness or some cracking or settling
- Po: Poor - Severe cracking, settling, or wetness
- NA: No Basement
BsmtExposure - Identifica o tipo de exposição do porão para o jardim.
- Gd: Good Exposure
- Av: Average Exposure (split levels or foyers typically score average or above)
- Mn: Mimimum Exposure
- No: No Exposure
- NA: No Basement
BsmtFinType1 - Classificação da área contruída tipo 1 do porão.
- GLQ: Good Living Quarters
- ALQ: Average Living Quarters
- BLQ: Below Average Living Quarters
- Rec: Average Rec Room
- LwQ: Low Quality
- Unf: Unfinshed
- NA: No Basement
BsmtFinSF1 - Área contruída tipo 1 do porão em pés quadrados.
BsmtFinType2 - Classificação da área contruída tipo 2 do porão (se mais de um).
- GLQ: Good Living Quarters
- ALQ: Average Living Quarters
- BLQ: Below Average Living Quarters
- Rec: Average Rec Room
- LwQ: Low Quality
- Unf: Unfinshed
- NA: No Basement
BsmtFinSF2 - Área contruída tipo 1 do porão em pés quadrados.
BsmtUnfSF - Área não contruída do porão em pés quadrados.
TotalBsmtSF - Área total do porão em pés quadrados.
Heating - Identifica o tipo de sistema de aquecimento.
- Floor: Floor Furnace
- GasA: Gas forced warm air furnace
- GasW: Gas hot water or steam heat
- Grav: Gravity furnace
- OthW: Hot water or steam heat other than gas
- Wall: Wall furnace
HeatingQC - Avaliação da qualidade e condição dos sistema de aquecimento.
- Ex: Excellent
- Gd: Good
- TA: Average/Typical
- Fa: Fair
- Po: Poor
CentralAir - Identifica a existência de sistema de ar condicionado.
- N: No
- Y: Yes
Electrical - Identifica o tipo de sistema elétrico.
- SBrkr: Standard Circuit Breakers & Romex
- FuseA: Fuse Box over 60 AMP and all Romex wiring (Average)
- FuseF: 60 AMP Fuse Box and mostly Romex wiring (Fair)
- FuseP: 60 AMP Fuse Box and mostly knob & tube wiring (poor)
- Mix: Mixed
1stFlrSF - Área do primeiro andar em pés quadrados.
2ndFlrSF - Área do segundo andar em pés quadrados.
LowQualFinSF - Área com acabamento de baixa qualidade em pés quadrados (todos os andares).
GrLivArea - Área habitável acima do solo em pés quadrados.
BsmtFullBath - Quantidade de banheiros completos no porão.
BsmtHalfBath - Quantidade de lavabos no porão.
FullBath - Quantidade de banheiros completos na área habitável acima do solo.
HalfBath - Quantidade de lavabos na área habitável acima do solo.
Bedroom - Quantidade de quartos na área habitável acima do solo (Não inclui o porão).
Kitchen - Quantidade de cozinhas na área habitável acima do solo.
KitchenQual - Avaliação da qualidade da cozinha.
- Ex: Excellent
- Gd: Good
- TA: Typical/Average
- Fa: Fair
- Po: Poor
TotRmsAbvGrd - Quantidade total de cômodoss na área habitável acima do solo (Não inclui banheiros).
Functional - Idintifica a funcionaidade da propriedade (assume como “Típica” a menos que haja alguma dedução).
- Typ: Typical Functionality
- Min1: Minor Deductions 1
- Min2: Minor Deductions 2
- Mod: Moderate Deductions
- Maj1: Major Deductions 1
- Maj2: Major Deductions 2
- Sev: Severely Damaged
- Sal: Salvage only
Fireplaces - Quantidade de lareiras.
FireplaceQu - Avaliação da qualidade da lareira.
- Ex: Excellent - Exceptional Masonry Fireplace
- Gd: Good - Masonry Fireplace in main level
- TA: Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
- Fa: Fair - Prefabricated Fireplace in basement
- Po: Poor - Ben Franklin Stove
- NA: No Fireplace
GarageType - Identifica a localização da garagem.
- 2Types: More than one type of garage
- Attchd: Attached to home
- Basment: Basement Garage
- BuiltIn: Built-In (Garage part of house - typically has room above garage)
- CarPort: Car Port
- Detchd: Detached from home
- NA: No Garage
GarageYrBlt - Ano de contrução da garagem.
GarageFinish - Identifica o acabamento interno da garagem.
- Fin: Finished
- RFn: Rough Finished
- Unf: Unfinished
- NA: No Garage
GarageCars - Quantidade de carros que a garagem comporta.
GarageArea - Tamanho da garagem em pés quadrados.
GarageQual - Identifica a qualidade da garagem.
- Ex: Excellent
- Gd: Good
- TA: Typical/Average
- Fa: Fair
- Po: Poor
- NA: No Garage
GarageCond - Identifica a condição da garagem.
- Ex: Excellent
- Gd: Good
- TA: Typical/Average
- Fa: Fair
- Po: Poor
- NA: No Garage
PavedDrive - Identifica se a entrada da gragem é pavimentada.
- Y: Paved
- P: Partial Pavement
- N: Dirt/Gravel
WoodDeckSF - Área do deck de madeira em pés quadrados.
OpenPorchSF - Área da varanda em pés quadrados.
EnclosedPorch - Área da varanda fechada (telhado) em pés quadrados.
3SsnPorch - Área da varanda fechada (3 seasons) em pés quadrados.
ScreenPorch - Área da varanda fechada (tela) em pés quadrados.
PoolArea - Área da da piscina em pés quadrados.
PoolQC - Identifica a qualidade da piscina.
- Ex: Excellent
- Gd: Good
- TA: Average/Typical
- Fa: Fair
- NA: No Pool
Fence - Identifica a qualidade da cerca.
- GdPrv: Good Privacy
- MnPrv: Minimum Privacy
- GdWo: Good Wood
- MnWw: Minimum Wood/Wire
- NA: No Fence
MiscFeature - Identifica coisas variadas que não foram cobertas nas categorias anteriores.
- Elev: Elevator
- Gar2: 2nd Garage (if not described in garage section)
- Othr: Other
- Shed: Shed (over 100 SF)
- TenC: Tennis Court
- NA: None
MiscVal - Valor em dólares das coisas variádas descritas acima.
MoSold - Mês da venda (MM).
YrSold - Ano da venda (YYYY).
SaleType - Identifica o tipo da venda.
- WD: Warranty Deed - Conventional
- CWD: Warranty Deed - Cash
- VWD: Warranty Deed - VA Loan
- New: Home just constructed and sold
- COD: Court Officer Deed/Estate
- Con: Contract 15% Down payment regular terms
- ConLw: Contract Low Down payment and low interest
- ConLI: Contract Low Interest
- ConLD: Contract Low Down
- Oth: Other
SaleCondition - Condição da venda.
- Normal: Normal Sale
- Abnorml: Abnormal Sale - trade, foreclosure, short sale
- AdjLand: Adjoining Land Purchase
- Alloca: Allocation - two linked properties with separate deeds, typically condo with a garage unit
- Family: Sale between family members
- Partial: Home was not completed when last assessed (associated with
New Homes)
SalesPrice - O preço de venda da casa. Este é o target, a coluna que você está tentando prever.
test.csv - Registros de cerca de metade (~1459) das casas, para serem usados como dados de teste.
Sua tarefa é prever o valor de Transportado para os passageiros neste conjunto.
sample_submission.csv - Um arquivo de envio no formato correto.Id - Id para cada casa no conjunto de teste.
SalesPrice - O alvo. Para cada casa, preveja o valor da venda associado a ela.
Preparação dos Dados [Data Preparation]
Importação dos Dados
Vamos importar os 3 bancos: Train, Test e Sample_Submission.
ds2_train <- read_csv(".\\house-prices-regression\\train.csv")
ds2_test <- read_csv(".\\house-prices-regression\\test.csv")
ds2_submission <- read_csv(".\\house-prices-regression\\sample_submission.csv")Antes de ajustar as variáveis, vamos combinar os dados para que as
mudanças sejam feitas de forma mais rápida.
#merge between test & submission
ds2_test_submission <- merge(ds2_test, ds2_submission, by=c("Id", "Id"))
ds2_train$table <- "train"
ds2_test_submission$table <- "test"
ds2_all <- rbind(ds2_train, ds2_test_submission)Ajuste das Variáveis
Esse banco de dados consiste basicamente em variáveis qualitativas em
sua grande maioria. Ainda sim algumas mudanças precisam ser feitas para
que possamos realizar a modelagem.
As variáveis OverallQal e OverallCond
são Escalas de Likert pura e simplemente com 10 níveis de
classificação. Existem outras variáveis no banco como
ExterQual, ExterCond,
BsmtQual, BsmtCond,
HeatingQC, KitchenQual,
FireplaceQu, GarageQual,
GarageCond e PoolQC também são escalas
de Likert contudo temos letras ao invés de números e essas letras também
representam uma ordem. Vamos fazer a modificação dessas variáveis,
principalmente para o caso em que uma das possíveis respostas para elas
é “NA”, indicando que essa pergunta não se aplica naquela moradia e não
deve ser removido como se fosse valor faltante.
Variáveis relativas ao porão por exemplo, só serão respondidas se a casa
possui porão, então faz sentido criar uma variável dummy respondendo
essa pergunta simples pois pode ser mais significante no modelo saber se
tem ou não porão. O mesmo para a variável sobre a qualidade da lareira,
quantidade de garagens, condição da piscina, condição da cerca e a
variável MiscFeature.
Snapshot:
head(ds2_all) %>% kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | table |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 60 | RL | 65 | 8450 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NA | Attchd | 2003 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2008 | WD | Normal | 208500 | train |
| 2 | 20 | RL | 80 | 9600 | Pave | NA | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 5 | 2007 | WD | Normal | 181500 | train |
| 3 | 60 | RL | 68 | 11250 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 9 | 2008 | WD | Normal | 223500 | train |
| 4 | 70 | RL | 60 | 9550 | Pave | NA | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2006 | WD | Abnorml | 140000 | train |
| 5 | 60 | RL | 84 | 14260 | Pave | NA | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 12 | 2008 | WD | Normal | 250000 | train |
| 6 | 50 | RL | 85 | 14115 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | Mitchel | Norm | Norm | 1Fam | 1.5Fin | 5 | 5 | 1993 | 1995 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | Wood | Gd | TA | No | GLQ | 732 | Unf | 0 | 64 | 796 | GasA | Ex | Y | SBrkr | 796 | 566 | 0 | 1362 | 1 | 0 | 1 | 1 | 1 | 1 | TA | 5 | Typ | 0 | NA | Attchd | 1993 | Unf | 2 | 480 | TA | TA | Y | 40 | 30 | 0 | 320 | 0 | 0 | NA | MnPrv | Shed | 700 | 10 | 2009 | WD | Normal | 143000 | train |
Observações faltantes
#h_train1_summary0 <- as.data.frame(which( colSums( is.na(ds2_all[ds2_all$table == "train",]) ) > 0 ))
#colnames(h_train1_summary3) <- c("Missing Observations")
h_train1_summary0 <- as.data.frame(t(ds2_all[ds2_all$table == "train",] %>%
summarise(across(everything(), ~ sum(is.na(.x)) ))))
colnames(h_train1_summary0) <- c("Missing Observations")
h_train1_summary0 %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T) %>%
column_spec(2, color = "black",
background = ifelse(h_train1_summary0 > 0, "yellow", "blank")
#,
#popover = paste("am:", mtcars$am[1:8])
)| Missing Observations | |
|---|---|
| Id | 0 |
| MSSubClass | 0 |
| MSZoning | 0 |
| LotFrontage | 259 |
| LotArea | 0 |
| Street | 0 |
| Alley | 1369 |
| LotShape | 0 |
| LandContour | 0 |
| Utilities | 0 |
| LotConfig | 0 |
| LandSlope | 0 |
| Neighborhood | 0 |
| Condition1 | 0 |
| Condition2 | 0 |
| BldgType | 0 |
| HouseStyle | 0 |
| OverallQual | 0 |
| OverallCond | 0 |
| YearBuilt | 0 |
| YearRemodAdd | 0 |
| RoofStyle | 0 |
| RoofMatl | 0 |
| Exterior1st | 0 |
| Exterior2nd | 0 |
| MasVnrType | 8 |
| MasVnrArea | 8 |
| ExterQual | 0 |
| ExterCond | 0 |
| Foundation | 0 |
| BsmtQual | 37 |
| BsmtCond | 37 |
| BsmtExposure | 38 |
| BsmtFinType1 | 37 |
| BsmtFinSF1 | 0 |
| BsmtFinType2 | 38 |
| BsmtFinSF2 | 0 |
| BsmtUnfSF | 0 |
| TotalBsmtSF | 0 |
| Heating | 0 |
| HeatingQC | 0 |
| CentralAir | 0 |
| Electrical | 1 |
| 1stFlrSF | 0 |
| 2ndFlrSF | 0 |
| LowQualFinSF | 0 |
| GrLivArea | 0 |
| BsmtFullBath | 0 |
| BsmtHalfBath | 0 |
| FullBath | 0 |
| HalfBath | 0 |
| BedroomAbvGr | 0 |
| KitchenAbvGr | 0 |
| KitchenQual | 0 |
| TotRmsAbvGrd | 0 |
| Functional | 0 |
| Fireplaces | 0 |
| FireplaceQu | 690 |
| GarageType | 81 |
| GarageYrBlt | 81 |
| GarageFinish | 81 |
| GarageCars | 0 |
| GarageArea | 0 |
| GarageQual | 81 |
| GarageCond | 81 |
| PavedDrive | 0 |
| WoodDeckSF | 0 |
| OpenPorchSF | 0 |
| EnclosedPorch | 0 |
| 3SsnPorch | 0 |
| ScreenPorch | 0 |
| PoolArea | 0 |
| PoolQC | 1453 |
| Fence | 1179 |
| MiscFeature | 1406 |
| MiscVal | 0 |
| MoSold | 0 |
| YrSold | 0 |
| SaleType | 0 |
| SaleCondition | 0 |
| SalePrice | 0 |
| table | 0 |
Porão
Note que as variáveis que são relacionadas com o
Porão possuem quase a mesma quantidade de observações
faltantes, idicando que possivelmente se a casa não tem porão, as depois
variáveis relacioandas a essa parte da casa fazem sentido serem nulas.
Contudo, não temos uma variável na base que indica claramente se a casa
possui ou não porão e para fins desse modelo, seria conveniente ter uma
variável binária dizendo se a casa tem ou não porão.
Vamos então analisar os valores faltantes dessa variável e ver se de
fato todas os faltantes são casas que não possuem porão.
ds2_all[ds2_all$table == "train",] %>%
dplyr::select("Id" | contains("Bsm")) %>%
filter_all(any_vars( is.na(.))) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | BsmtFullBath | BsmtHalfBath |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 18 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 40 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 91 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 103 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 157 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 183 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 260 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 333 | Gd | TA | No | GLQ | 1124 | NA | 479 | 1603 | 3206 | 1 | 0 |
| 343 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 363 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 372 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 393 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 521 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 533 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 534 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 554 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 647 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 706 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 737 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 750 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 779 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 869 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 895 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 898 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 949 | Gd | TA | NA | Unf | 0 | Unf | 0 | 936 | 936 | 0 | 0 |
| 985 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1001 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1012 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1036 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1046 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1049 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1050 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1091 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1180 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1217 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1219 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1233 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1322 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
| 1413 | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | 0 | 0 |
Repare que a casa de ID = 333 tem uma variável
faltante no campo BsmtFinType2, o que deveria indicar
que a casa não tem porão, contude a variável
BsmtFinType1 foi preenchida e também a variável que se
relaciona com a metragem contruida no tipo 2,
BsmtFinSF2, indicando que de fato, a observação
333 tem um valor missing e que não sabemos qual a classificação
dada para essa parte da casa.
A casa de ID = 949 parece mostrar que a casa possui um
porão, mas não sabemos mais detalhes sobre o mesmo, parece não foi
terminado e somente temos a metragem final. Nesse caso, ter BsmtExposure
= NA significaria que a casa não possui porão o que não parenta ser
verdade. Isso indica que 949 possui um valor faltante real. O
mesmo ocorre com a casa de ID = 1488.
Já a casa de ID = 2041 possui um valor faltante na
variável BsmtCond, indicando que esse também é um valor
faltante real. O mesmo ocorre para as casa de IDs 2186
e 2525.
As casas com ID = 2218 e 2219 possuem
valores faltantes reais na variável BsmtQual.
A casa com ID = 2349 possue um valor faltante real para
a variável BsmtExposure.
Ou seja, a maioria dos valores faltantes se referem à casas que não
possuem porão, com exceção dos descritos acima, isso deve ser
considerado quando formos criar uma variável binária indicativa de ter
ou não porão na casa.
Para fins de estudo, decidimos que se:
- BsmtQual = NA - BsmtCond = NA - BsmtExposure = NA - BsmtFinType1 = NA
- BsmtFinSF1 = 0 - BsmtFinType2 = NA - BsmtFinSF2 = 0 - BsmtUnfSF = 0 -
TotalBsmtSF = 0
então temos que a casa não possui um porão, caso contrário vamos dizer
que possui. Somente com essa reclassificação já poderemos tirar os
valores reais faltantes do que não são reais.
Alvenaria
Similarmente, temos para se a casa possui ou não alvernaria:
- MasVnrType = None - MasVnrArea = 0
ambas as condições tem que ocorrer.
ds2_all[ds2_all$table == "train",] %>%
dplyr::select("Id" | contains("MasVnr")) %>%
filter_all(any_vars(grepl("None",.))) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | MasVnrType | MasVnrArea |
|---|---|---|
| 2 | None | 0 |
| 4 | None | 0 |
| 6 | None | 0 |
| 9 | None | 0 |
| 10 | None | 0 |
| 11 | None | 0 |
| 13 | None | 0 |
| 16 | None | 0 |
| 18 | None | 0 |
| 19 | None | 0 |
| 20 | None | 0 |
| 22 | None | 0 |
| 24 | None | 0 |
| 25 | None | 0 |
| 27 | None | 0 |
| 29 | None | 0 |
| 30 | None | 0 |
| 31 | None | 0 |
| 32 | None | 0 |
| 33 | None | 0 |
| 34 | None | 0 |
| 37 | None | 0 |
| 39 | None | 0 |
| 40 | None | 0 |
| 42 | None | 0 |
| 43 | None | 0 |
| 44 | None | 0 |
| 45 | None | 0 |
| 47 | None | 0 |
| 48 | None | 0 |
| 49 | None | 0 |
| 50 | None | 0 |
| 51 | None | 0 |
| 52 | None | 0 |
| 53 | None | 0 |
| 54 | None | 0 |
| 55 | None | 0 |
| 58 | None | 0 |
| 60 | None | 0 |
| 61 | None | 0 |
| 62 | None | 0 |
| 64 | None | 0 |
| 69 | None | 0 |
| 70 | None | 0 |
| 72 | None | 0 |
| 75 | None | 0 |
| 76 | None | 0 |
| 77 | None | 0 |
| 78 | None | 0 |
| 79 | None | 0 |
| 80 | None | 0 |
| 87 | None | 0 |
| 89 | None | 0 |
| 90 | None | 0 |
| 91 | None | 0 |
| 93 | None | 0 |
| 94 | None | 0 |
| 95 | None | 0 |
| 99 | None | 0 |
| 100 | None | 0 |
| 103 | None | 0 |
| 104 | None | 0 |
| 107 | None | 0 |
| 108 | None | 0 |
| 109 | None | 0 |
| 111 | None | 0 |
| 112 | None | 0 |
| 115 | None | 0 |
| 118 | None | 0 |
| 119 | None | 0 |
| 120 | None | 0 |
| 121 | None | 0 |
| 122 | None | 0 |
| 124 | None | 0 |
| 125 | None | 0 |
| 126 | None | 0 |
| 127 | None | 0 |
| 128 | None | 0 |
| 133 | None | 0 |
| 135 | None | 0 |
| 140 | None | 0 |
| 141 | None | 0 |
| 142 | None | 0 |
| 143 | None | 0 |
| 147 | None | 0 |
| 150 | None | 0 |
| 151 | None | 0 |
| 154 | None | 0 |
| 155 | None | 0 |
| 156 | None | 0 |
| 157 | None | 0 |
| 159 | None | 0 |
| 161 | None | 0 |
| 164 | None | 0 |
| 165 | None | 0 |
| 166 | None | 0 |
| 167 | None | 0 |
| 169 | None | 0 |
| 171 | None | 0 |
| 173 | None | 0 |
| 176 | None | 0 |
| 178 | None | 0 |
| 180 | None | 0 |
| 182 | None | 0 |
| 184 | None | 0 |
| 185 | None | 0 |
| 186 | None | 0 |
| 187 | None | 0 |
| 188 | None | 0 |
| 190 | None | 0 |
| 191 | None | 0 |
| 193 | None | 0 |
| 195 | None | 0 |
| 196 | None | 0 |
| 198 | None | 0 |
| 199 | None | 0 |
| 201 | None | 0 |
| 202 | None | 0 |
| 203 | None | 0 |
| 205 | None | 0 |
| 206 | None | 0 |
| 207 | None | 0 |
| 208 | None | 0 |
| 211 | None | 0 |
| 212 | None | 0 |
| 213 | None | 0 |
| 214 | None | 0 |
| 218 | None | 0 |
| 221 | None | 0 |
| 222 | None | 0 |
| 223 | None | 0 |
| 224 | None | 0 |
| 229 | None | 0 |
| 234 | None | 0 |
| 238 | None | 0 |
| 240 | None | 0 |
| 242 | None | 0 |
| 243 | None | 0 |
| 244 | None | 0 |
| 245 | None | 0 |
| 247 | None | 0 |
| 248 | None | 0 |
| 251 | None | 0 |
| 253 | None | 0 |
| 255 | None | 0 |
| 257 | None | 0 |
| 260 | None | 0 |
| 264 | None | 0 |
| 265 | None | 0 |
| 267 | None | 0 |
| 268 | None | 0 |
| 269 | None | 0 |
| 271 | None | 0 |
| 272 | None | 0 |
| 275 | None | 0 |
| 276 | None | 0 |
| 277 | None | 0 |
| 278 | None | 0 |
| 285 | None | 0 |
| 286 | None | 0 |
| 287 | None | 0 |
| 288 | None | 0 |
| 290 | None | 0 |
| 291 | None | 0 |
| 292 | None | 0 |
| 293 | None | 0 |
| 296 | None | 0 |
| 297 | None | 0 |
| 300 | None | 0 |
| 302 | None | 0 |
| 304 | None | 0 |
| 305 | None | 0 |
| 308 | None | 0 |
| 309 | None | 0 |
| 310 | None | 0 |
| 312 | None | 0 |
| 313 | None | 0 |
| 314 | None | 0 |
| 315 | None | 0 |
| 316 | None | 0 |
| 318 | None | 0 |
| 324 | None | 0 |
| 326 | None | 0 |
| 327 | None | 0 |
| 329 | None | 0 |
| 330 | None | 0 |
| 332 | None | 0 |
| 335 | None | 0 |
| 336 | None | 0 |
| 339 | None | 0 |
| 341 | None | 0 |
| 342 | None | 0 |
| 345 | None | 0 |
| 346 | None | 0 |
| 347 | None | 0 |
| 352 | None | 0 |
| 353 | None | 0 |
| 354 | None | 0 |
| 355 | None | 0 |
| 356 | None | 0 |
| 358 | None | 0 |
| 361 | None | 0 |
| 362 | None | 0 |
| 366 | None | 0 |
| 370 | None | 0 |
| 371 | None | 0 |
| 372 | None | 0 |
| 373 | None | 0 |
| 374 | None | 0 |
| 375 | None | 0 |
| 376 | None | 0 |
| 377 | None | 0 |
| 378 | None | 0 |
| 381 | None | 0 |
| 382 | None | 0 |
| 383 | None | 0 |
| 384 | None | 0 |
| 385 | None | 0 |
| 387 | None | 0 |
| 391 | None | 0 |
| 392 | None | 0 |
| 393 | None | 0 |
| 394 | None | 0 |
| 395 | None | 0 |
| 396 | None | 0 |
| 397 | None | 0 |
| 399 | None | 0 |
| 401 | None | 0 |
| 402 | None | 0 |
| 403 | None | 0 |
| 405 | None | 0 |
| 406 | None | 0 |
| 407 | None | 0 |
| 408 | None | 0 |
| 411 | None | 0 |
| 412 | None | 0 |
| 414 | None | 0 |
| 415 | None | 0 |
| 416 | None | 0 |
| 418 | None | 0 |
| 419 | None | 0 |
| 420 | None | 0 |
| 423 | None | 0 |
| 426 | None | 0 |
| 428 | None | 0 |
| 430 | None | 0 |
| 432 | None | 0 |
| 434 | None | 0 |
| 435 | None | 0 |
| 436 | None | 0 |
| 437 | None | 0 |
| 438 | None | 0 |
| 439 | None | 0 |
| 440 | None | 0 |
| 443 | None | 0 |
| 445 | None | 0 |
| 446 | None | 0 |
| 447 | None | 0 |
| 448 | None | 0 |
| 449 | None | 0 |
| 450 | None | 0 |
| 451 | None | 0 |
| 452 | None | 0 |
| 454 | None | 0 |
| 455 | None | 0 |
| 457 | None | 0 |
| 458 | None | 0 |
| 459 | None | 0 |
| 462 | None | 0 |
| 463 | None | 0 |
| 464 | None | 0 |
| 467 | None | 0 |
| 471 | None | 0 |
| 472 | None | 0 |
| 475 | None | 0 |
| 476 | None | 0 |
| 483 | None | 0 |
| 485 | None | 0 |
| 486 | None | 0 |
| 488 | None | 0 |
| 489 | None | 0 |
| 490 | None | 0 |
| 491 | None | 0 |
| 492 | None | 0 |
| 493 | None | 0 |
| 494 | None | 0 |
| 495 | None | 0 |
| 496 | None | 0 |
| 497 | None | 0 |
| 498 | None | 0 |
| 500 | None | 0 |
| 502 | None | 0 |
| 503 | None | 0 |
| 504 | None | 0 |
| 505 | None | 0 |
| 508 | None | 0 |
| 509 | None | 0 |
| 513 | None | 0 |
| 514 | None | 0 |
| 515 | None | 0 |
| 519 | None | 0 |
| 520 | None | 0 |
| 521 | None | 0 |
| 522 | None | 0 |
| 523 | None | 0 |
| 526 | None | 0 |
| 527 | None | 0 |
| 529 | None | 0 |
| 532 | None | 0 |
| 533 | None | 0 |
| 534 | None | 0 |
| 535 | None | 0 |
| 536 | None | 0 |
| 537 | None | 0 |
| 538 | None | 0 |
| 543 | None | 0 |
| 545 | None | 0 |
| 547 | None | 0 |
| 548 | None | 0 |
| 549 | None | 0 |
| 550 | None | 0 |
| 551 | None | 0 |
| 552 | None | 0 |
| 554 | None | 0 |
| 556 | None | 0 |
| 557 | None | 0 |
| 558 | None | 0 |
| 559 | None | 0 |
| 562 | None | 0 |
| 563 | None | 0 |
| 564 | None | 0 |
| 565 | None | 0 |
| 566 | None | 0 |
| 569 | None | 0 |
| 570 | None | 0 |
| 573 | None | 0 |
| 574 | None | 0 |
| 576 | None | 0 |
| 577 | None | 0 |
| 579 | None | 0 |
| 584 | None | 0 |
| 585 | None | 0 |
| 587 | None | 0 |
| 588 | None | 0 |
| 589 | None | 0 |
| 590 | None | 0 |
| 591 | None | 0 |
| 593 | None | 0 |
| 595 | None | 0 |
| 597 | None | 0 |
| 600 | None | 0 |
| 602 | None | 0 |
| 603 | None | 0 |
| 604 | None | 0 |
| 607 | None | 0 |
| 608 | None | 0 |
| 609 | None | 0 |
| 614 | None | 0 |
| 615 | None | 0 |
| 617 | None | 0 |
| 618 | None | 0 |
| 621 | None | 0 |
| 622 | None | 0 |
| 625 | None | 288 |
| 627 | None | 0 |
| 631 | None | 0 |
| 634 | None | 0 |
| 635 | None | 0 |
| 636 | None | 0 |
| 637 | None | 0 |
| 638 | None | 0 |
| 639 | None | 0 |
| 640 | None | 0 |
| 642 | None | 0 |
| 643 | None | 0 |
| 646 | None | 0 |
| 647 | None | 0 |
| 650 | None | 0 |
| 652 | None | 0 |
| 653 | None | 0 |
| 654 | None | 0 |
| 658 | None | 0 |
| 659 | None | 0 |
| 660 | None | 0 |
| 664 | None | 0 |
| 665 | None | 0 |
| 669 | None | 0 |
| 670 | None | 0 |
| 671 | None | 0 |
| 672 | None | 0 |
| 673 | None | 0 |
| 675 | None | 0 |
| 676 | None | 0 |
| 677 | None | 0 |
| 678 | None | 0 |
| 681 | None | 0 |
| 682 | None | 0 |
| 683 | None | 0 |
| 686 | None | 0 |
| 687 | None | 0 |
| 688 | None | 0 |
| 694 | None | 0 |
| 695 | None | 0 |
| 697 | None | 0 |
| 698 | None | 0 |
| 699 | None | 0 |
| 700 | None | 0 |
| 704 | None | 0 |
| 706 | None | 0 |
| 709 | None | 0 |
| 711 | None | 0 |
| 712 | None | 0 |
| 713 | None | 0 |
| 717 | None | 0 |
| 718 | None | 0 |
| 720 | None | 0 |
| 721 | None | 0 |
| 723 | None | 0 |
| 724 | None | 0 |
| 726 | None | 0 |
| 730 | None | 0 |
| 731 | None | 0 |
| 734 | None | 0 |
| 735 | None | 0 |
| 736 | None | 0 |
| 737 | None | 0 |
| 738 | None | 0 |
| 739 | None | 0 |
| 740 | None | 0 |
| 741 | None | 0 |
| 742 | None | 0 |
| 744 | None | 0 |
| 745 | None | 0 |
| 747 | None | 0 |
| 748 | None | 0 |
| 750 | None | 0 |
| 751 | None | 0 |
| 752 | None | 0 |
| 753 | None | 0 |
| 755 | None | 0 |
| 757 | None | 0 |
| 761 | None | 0 |
| 762 | None | 0 |
| 763 | None | 0 |
| 765 | None | 0 |
| 768 | None | 0 |
| 769 | None | 0 |
| 771 | None | 0 |
| 772 | None | 0 |
| 773 | None | 0 |
| 774 | None | 1 |
| 778 | None | 0 |
| 782 | None | 0 |
| 783 | None | 0 |
| 785 | None | 0 |
| 786 | None | 0 |
| 787 | None | 0 |
| 788 | None | 0 |
| 789 | None | 0 |
| 792 | None | 0 |
| 793 | None | 0 |
| 795 | None | 0 |
| 798 | None | 0 |
| 801 | None | 0 |
| 802 | None | 0 |
| 803 | None | 0 |
| 805 | None | 0 |
| 807 | None | 0 |
| 808 | None | 0 |
| 810 | None | 0 |
| 813 | None | 0 |
| 815 | None | 0 |
| 817 | None | 0 |
| 821 | None | 0 |
| 822 | None | 0 |
| 823 | None | 0 |
| 824 | None | 0 |
| 825 | None | 0 |
| 827 | None | 0 |
| 828 | None | 0 |
| 829 | None | 0 |
| 830 | None | 0 |
| 832 | None | 0 |
| 836 | None | 0 |
| 837 | None | 0 |
| 839 | None | 0 |
| 840 | None | 0 |
| 841 | None | 0 |
| 842 | None | 0 |
| 844 | None | 0 |
| 845 | None | 0 |
| 846 | None | 0 |
| 848 | None | 0 |
| 849 | None | 0 |
| 853 | None | 0 |
| 855 | None | 0 |
| 856 | None | 0 |
| 857 | None | 0 |
| 858 | None | 0 |
| 861 | None | 0 |
| 862 | None | 0 |
| 863 | None | 0 |
| 864 | None | 0 |
| 865 | None | 0 |
| 868 | None | 0 |
| 869 | None | 0 |
| 871 | None | 0 |
| 873 | None | 0 |
| 874 | None | 0 |
| 875 | None | 0 |
| 876 | None | 0 |
| 877 | None | 0 |
| 879 | None | 0 |
| 881 | None | 0 |
| 883 | None | 0 |
| 884 | None | 0 |
| 888 | None | 0 |
| 889 | None | 0 |
| 891 | None | 0 |
| 893 | None | 0 |
| 894 | None | 0 |
| 895 | None | 0 |
| 897 | None | 0 |
| 898 | None | 0 |
| 900 | None | 0 |
| 901 | None | 0 |
| 902 | None | 0 |
| 903 | None | 0 |
| 908 | None | 0 |
| 909 | None | 0 |
| 910 | None | 0 |
| 912 | None | 0 |
| 913 | None | 0 |
| 914 | None | 0 |
| 917 | None | 0 |
| 918 | None | 0 |
| 919 | None | 0 |
| 920 | None | 0 |
| 922 | None | 0 |
| 923 | None | 0 |
| 924 | None | 0 |
| 926 | None | 0 |
| 929 | None | 0 |
| 931 | None | 0 |
| 932 | None | 0 |
| 935 | None | 0 |
| 938 | None | 0 |
| 939 | None | 0 |
| 940 | None | 0 |
| 941 | None | 0 |
| 943 | None | 0 |
| 944 | None | 0 |
| 946 | None | 0 |
| 950 | None | 0 |
| 951 | None | 0 |
| 952 | None | 0 |
| 953 | None | 0 |
| 957 | None | 0 |
| 958 | None | 0 |
| 959 | None | 0 |
| 960 | None | 0 |
| 961 | None | 0 |
| 963 | None | 0 |
| 964 | None | 0 |
| 966 | None | 0 |
| 967 | None | 0 |
| 969 | None | 0 |
| 971 | None | 0 |
| 973 | None | 0 |
| 975 | None | 0 |
| 976 | None | 0 |
| 977 | None | 0 |
| 979 | None | 0 |
| 980 | None | 0 |
| 984 | None | 0 |
| 985 | None | 0 |
| 986 | None | 0 |
| 987 | None | 0 |
| 990 | None | 0 |
| 992 | None | 0 |
| 994 | None | 0 |
| 996 | None | 0 |
| 997 | None | 0 |
| 999 | None | 0 |
| 1001 | None | 0 |
| 1002 | None | 0 |
| 1007 | None | 0 |
| 1008 | None | 0 |
| 1010 | None | 0 |
| 1011 | None | 0 |
| 1012 | None | 0 |
| 1013 | None | 0 |
| 1014 | None | 0 |
| 1016 | None | 0 |
| 1018 | None | 0 |
| 1021 | None | 0 |
| 1023 | None | 0 |
| 1025 | None | 0 |
| 1026 | None | 0 |
| 1029 | None | 0 |
| 1031 | None | 0 |
| 1032 | None | 0 |
| 1033 | None | 0 |
| 1035 | None | 0 |
| 1036 | None | 0 |
| 1039 | None | 0 |
| 1040 | None | 0 |
| 1045 | None | 0 |
| 1046 | None | 0 |
| 1048 | None | 0 |
| 1050 | None | 0 |
| 1051 | None | 0 |
| 1052 | None | 0 |
| 1058 | None | 0 |
| 1061 | None | 0 |
| 1062 | None | 0 |
| 1063 | None | 0 |
| 1064 | None | 0 |
| 1066 | None | 0 |
| 1067 | None | 0 |
| 1069 | None | 0 |
| 1070 | None | 0 |
| 1072 | None | 0 |
| 1073 | None | 0 |
| 1075 | None | 0 |
| 1076 | None | 0 |
| 1077 | None | 0 |
| 1078 | None | 0 |
| 1080 | None | 0 |
| 1082 | None | 0 |
| 1085 | None | 0 |
| 1086 | None | 0 |
| 1087 | None | 0 |
| 1088 | None | 0 |
| 1090 | None | 0 |
| 1091 | None | 0 |
| 1093 | None | 0 |
| 1095 | None | 0 |
| 1096 | None | 0 |
| 1097 | None | 0 |
| 1098 | None | 0 |
| 1099 | None | 0 |
| 1101 | None | 0 |
| 1102 | None | 0 |
| 1104 | None | 0 |
| 1107 | None | 0 |
| 1108 | None | 0 |
| 1109 | None | 0 |
| 1111 | None | 0 |
| 1113 | None | 0 |
| 1114 | None | 0 |
| 1115 | None | 0 |
| 1117 | None | 0 |
| 1118 | None | 0 |
| 1119 | None | 0 |
| 1120 | None | 0 |
| 1121 | None | 0 |
| 1123 | None | 0 |
| 1124 | None | 0 |
| 1126 | None | 0 |
| 1128 | None | 0 |
| 1129 | None | 0 |
| 1131 | None | 0 |
| 1132 | None | 0 |
| 1133 | None | 0 |
| 1134 | None | 0 |
| 1135 | None | 0 |
| 1136 | None | 0 |
| 1137 | None | 0 |
| 1138 | None | 0 |
| 1139 | None | 0 |
| 1140 | None | 0 |
| 1141 | None | 0 |
| 1144 | None | 0 |
| 1145 | None | 0 |
| 1146 | None | 0 |
| 1148 | None | 0 |
| 1149 | None | 0 |
| 1150 | None | 0 |
| 1151 | None | 0 |
| 1153 | None | 0 |
| 1154 | None | 0 |
| 1156 | None | 0 |
| 1157 | None | 0 |
| 1161 | None | 0 |
| 1163 | None | 0 |
| 1164 | None | 0 |
| 1165 | None | 0 |
| 1168 | None | 0 |
| 1169 | None | 0 |
| 1171 | None | 0 |
| 1172 | None | 0 |
| 1173 | None | 0 |
| 1174 | None | 0 |
| 1175 | None | 0 |
| 1177 | None | 0 |
| 1178 | None | 0 |
| 1179 | None | 0 |
| 1180 | None | 0 |
| 1181 | None | 0 |
| 1183 | None | 0 |
| 1184 | None | 0 |
| 1186 | None | 0 |
| 1187 | None | 0 |
| 1188 | None | 0 |
| 1190 | None | 0 |
| 1193 | None | 0 |
| 1196 | None | 0 |
| 1197 | None | 0 |
| 1198 | None | 0 |
| 1199 | None | 0 |
| 1201 | None | 0 |
| 1202 | None | 0 |
| 1203 | None | 0 |
| 1205 | None | 0 |
| 1207 | None | 0 |
| 1209 | None | 0 |
| 1211 | None | 0 |
| 1212 | None | 0 |
| 1213 | None | 0 |
| 1214 | None | 0 |
| 1215 | None | 0 |
| 1216 | None | 0 |
| 1217 | None | 0 |
| 1219 | None | 0 |
| 1221 | None | 0 |
| 1222 | None | 0 |
| 1225 | None | 0 |
| 1230 | None | 0 |
| 1231 | None | 1 |
| 1233 | None | 0 |
| 1235 | None | 0 |
| 1236 | None | 0 |
| 1238 | None | 0 |
| 1239 | None | 0 |
| 1245 | None | 0 |
| 1247 | None | 0 |
| 1249 | None | 0 |
| 1250 | None | 0 |
| 1251 | None | 0 |
| 1253 | None | 0 |
| 1254 | None | 0 |
| 1256 | None | 0 |
| 1258 | None | 0 |
| 1260 | None | 0 |
| 1261 | None | 0 |
| 1262 | None | 0 |
| 1263 | None | 0 |
| 1264 | None | 0 |
| 1265 | None | 0 |
| 1267 | None | 0 |
| 1268 | None | 0 |
| 1271 | None | 0 |
| 1272 | None | 0 |
| 1275 | None | 0 |
| 1277 | None | 0 |
| 1280 | None | 0 |
| 1283 | None | 0 |
| 1284 | None | 0 |
| 1285 | None | 0 |
| 1286 | None | 0 |
| 1289 | None | 0 |
| 1293 | None | 0 |
| 1294 | None | 0 |
| 1295 | None | 0 |
| 1300 | None | 0 |
| 1301 | None | 344 |
| 1302 | None | 0 |
| 1308 | None | 0 |
| 1309 | None | 0 |
| 1311 | None | 0 |
| 1315 | None | 0 |
| 1316 | None | 0 |
| 1318 | None | 0 |
| 1321 | None | 0 |
| 1322 | None | 0 |
| 1323 | None | 0 |
| 1324 | None | 0 |
| 1326 | None | 0 |
| 1327 | None | 0 |
| 1328 | None | 0 |
| 1329 | None | 0 |
| 1330 | None | 0 |
| 1332 | None | 0 |
| 1333 | None | 0 |
| 1334 | None | 0 |
| 1335 | None | 312 |
| 1338 | None | 0 |
| 1339 | None | 0 |
| 1340 | None | 0 |
| 1341 | None | 0 |
| 1342 | None | 0 |
| 1344 | None | 0 |
| 1345 | None | 0 |
| 1346 | None | 0 |
| 1347 | None | 0 |
| 1349 | None | 0 |
| 1350 | None | 0 |
| 1353 | None | 0 |
| 1355 | None | 0 |
| 1358 | None | 0 |
| 1361 | None | 0 |
| 1363 | None | 0 |
| 1364 | None | 0 |
| 1365 | None | 0 |
| 1366 | None | 0 |
| 1368 | None | 0 |
| 1371 | None | 0 |
| 1372 | None | 0 |
| 1373 | None | 0 |
| 1375 | None | 0 |
| 1377 | None | 0 |
| 1378 | None | 0 |
| 1380 | None | 0 |
| 1381 | None | 0 |
| 1382 | None | 0 |
| 1383 | None | 0 |
| 1384 | None | 0 |
| 1385 | None | 0 |
| 1386 | None | 0 |
| 1388 | None | 0 |
| 1390 | None | 0 |
| 1392 | None | 0 |
| 1393 | None | 0 |
| 1394 | None | 0 |
| 1397 | None | 0 |
| 1398 | None | 0 |
| 1399 | None | 0 |
| 1400 | None | 0 |
| 1401 | None | 0 |
| 1402 | None | 0 |
| 1403 | None | 0 |
| 1405 | None | 0 |
| 1407 | None | 0 |
| 1408 | None | 0 |
| 1409 | None | 0 |
| 1411 | None | 0 |
| 1412 | None | 0 |
| 1413 | None | 0 |
| 1415 | None | 0 |
| 1417 | None | 0 |
| 1419 | None | 0 |
| 1422 | None | 0 |
| 1424 | None | 0 |
| 1425 | None | 0 |
| 1428 | None | 0 |
| 1431 | None | 0 |
| 1432 | None | 0 |
| 1433 | None | 0 |
| 1435 | None | 0 |
| 1437 | None | 0 |
| 1439 | None | 0 |
| 1441 | None | 0 |
| 1444 | None | 0 |
| 1446 | None | 0 |
| 1449 | None | 0 |
| 1450 | None | 0 |
| 1451 | None | 0 |
| 1454 | None | 0 |
| 1455 | None | 0 |
| 1456 | None | 0 |
| 1458 | None | 0 |
| 1459 | None | 0 |
| 1460 | None | 0 |
ds2_all[ds2_all$table == "train",] %>%
dplyr::select("Id" | contains("MasVnr")) %>%
filter_all(any_vars( is.na(.))) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | MasVnrType | MasVnrArea |
|---|---|---|
| 235 | NA | NA |
| 530 | NA | NA |
| 651 | NA | NA |
| 937 | NA | NA |
| 974 | NA | NA |
| 978 | NA | NA |
| 1244 | NA | NA |
| 1279 | NA | NA |
Lareira
Similarmente, temos para se a casa possui ou não lareira:
- Fireplaces = 0 - FireplaceQu = NA
ambas as condições tem que ocorrer.
ds2_all[ds2_all$table == "train",] %>%
dplyr::select("Id" | contains("Fireplace")) %>%
filter_all(any_vars( is.na(.))) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | Fireplaces | FireplaceQu |
|---|---|---|
| 1 | 0 | NA |
| 6 | 0 | NA |
| 11 | 0 | NA |
| 13 | 0 | NA |
| 16 | 0 | NA |
| 18 | 0 | NA |
| 19 | 0 | NA |
| 20 | 0 | NA |
| 27 | 0 | NA |
| 30 | 0 | NA |
| 31 | 0 | NA |
| 32 | 0 | NA |
| 33 | 0 | NA |
| 37 | 0 | NA |
| 39 | 0 | NA |
| 40 | 0 | NA |
| 43 | 0 | NA |
| 44 | 0 | NA |
| 45 | 0 | NA |
| 48 | 0 | NA |
| 49 | 0 | NA |
| 50 | 0 | NA |
| 51 | 0 | NA |
| 53 | 0 | NA |
| 57 | 0 | NA |
| 58 | 0 | NA |
| 60 | 0 | NA |
| 61 | 0 | NA |
| 62 | 0 | NA |
| 64 | 0 | NA |
| 65 | 0 | NA |
| 68 | 0 | NA |
| 69 | 0 | NA |
| 72 | 0 | NA |
| 74 | 0 | NA |
| 75 | 0 | NA |
| 76 | 0 | NA |
| 77 | 0 | NA |
| 78 | 0 | NA |
| 79 | 0 | NA |
| 80 | 0 | NA |
| 82 | 0 | NA |
| 84 | 0 | NA |
| 88 | 0 | NA |
| 89 | 0 | NA |
| 90 | 0 | NA |
| 91 | 0 | NA |
| 92 | 0 | NA |
| 93 | 0 | NA |
| 95 | 0 | NA |
| 97 | 0 | NA |
| 98 | 0 | NA |
| 99 | 0 | NA |
| 100 | 0 | NA |
| 103 | 0 | NA |
| 104 | 0 | NA |
| 107 | 0 | NA |
| 108 | 0 | NA |
| 109 | 0 | NA |
| 111 | 0 | NA |
| 118 | 0 | NA |
| 122 | 0 | NA |
| 123 | 0 | NA |
| 124 | 0 | NA |
| 126 | 0 | NA |
| 130 | 0 | NA |
| 133 | 0 | NA |
| 134 | 0 | NA |
| 138 | 0 | NA |
| 140 | 0 | NA |
| 142 | 0 | NA |
| 143 | 0 | NA |
| 144 | 0 | NA |
| 145 | 0 | NA |
| 146 | 0 | NA |
| 147 | 0 | NA |
| 149 | 0 | NA |
| 150 | 0 | NA |
| 151 | 0 | NA |
| 155 | 0 | NA |
| 156 | 0 | NA |
| 157 | 0 | NA |
| 161 | 0 | NA |
| 164 | 0 | NA |
| 165 | 0 | NA |
| 166 | 0 | NA |
| 171 | 0 | NA |
| 180 | 0 | NA |
| 184 | 0 | NA |
| 185 | 0 | NA |
| 187 | 0 | NA |
| 188 | 0 | NA |
| 192 | 0 | NA |
| 193 | 0 | NA |
| 194 | 0 | NA |
| 195 | 0 | NA |
| 199 | 0 | NA |
| 201 | 0 | NA |
| 203 | 0 | NA |
| 205 | 0 | NA |
| 206 | 0 | NA |
| 210 | 0 | NA |
| 211 | 0 | NA |
| 212 | 0 | NA |
| 214 | 0 | NA |
| 215 | 0 | NA |
| 217 | 0 | NA |
| 218 | 0 | NA |
| 220 | 0 | NA |
| 221 | 0 | NA |
| 224 | 0 | NA |
| 226 | 0 | NA |
| 228 | 0 | NA |
| 231 | 0 | NA |
| 234 | 0 | NA |
| 236 | 0 | NA |
| 237 | 0 | NA |
| 238 | 0 | NA |
| 239 | 0 | NA |
| 241 | 0 | NA |
| 242 | 0 | NA |
| 243 | 0 | NA |
| 247 | 0 | NA |
| 249 | 0 | NA |
| 251 | 0 | NA |
| 253 | 0 | NA |
| 254 | 0 | NA |
| 255 | 0 | NA |
| 257 | 0 | NA |
| 260 | 0 | NA |
| 264 | 0 | NA |
| 265 | 0 | NA |
| 275 | 0 | NA |
| 276 | 0 | NA |
| 277 | 0 | NA |
| 278 | 0 | NA |
| 282 | 0 | NA |
| 285 | 0 | NA |
| 286 | 0 | NA |
| 288 | 0 | NA |
| 289 | 0 | NA |
| 290 | 0 | NA |
| 292 | 0 | NA |
| 296 | 0 | NA |
| 297 | 0 | NA |
| 304 | 0 | NA |
| 306 | 0 | NA |
| 308 | 0 | NA |
| 309 | 0 | NA |
| 321 | 0 | NA |
| 324 | 0 | NA |
| 326 | 0 | NA |
| 328 | 0 | NA |
| 329 | 0 | NA |
| 330 | 0 | NA |
| 331 | 0 | NA |
| 332 | 0 | NA |
| 338 | 0 | NA |
| 339 | 0 | NA |
| 340 | 0 | NA |
| 341 | 0 | NA |
| 342 | 0 | NA |
| 343 | 0 | NA |
| 345 | 0 | NA |
| 347 | 0 | NA |
| 349 | 0 | NA |
| 353 | 0 | NA |
| 354 | 0 | NA |
| 356 | 0 | NA |
| 357 | 0 | NA |
| 359 | 0 | NA |
| 362 | 0 | NA |
| 364 | 0 | NA |
| 366 | 0 | NA |
| 373 | 0 | NA |
| 374 | 0 | NA |
| 376 | 0 | NA |
| 377 | 0 | NA |
| 383 | 0 | NA |
| 384 | 0 | NA |
| 387 | 0 | NA |
| 389 | 0 | NA |
| 391 | 0 | NA |
| 393 | 0 | NA |
| 395 | 0 | NA |
| 396 | 0 | NA |
| 397 | 0 | NA |
| 399 | 0 | NA |
| 400 | 0 | NA |
| 403 | 0 | NA |
| 407 | 0 | NA |
| 408 | 0 | NA |
| 411 | 0 | NA |
| 412 | 0 | NA |
| 416 | 0 | NA |
| 419 | 0 | NA |
| 421 | 0 | NA |
| 423 | 0 | NA |
| 428 | 0 | NA |
| 429 | 0 | NA |
| 431 | 0 | NA |
| 432 | 0 | NA |
| 433 | 0 | NA |
| 435 | 0 | NA |
| 437 | 0 | NA |
| 438 | 0 | NA |
| 440 | 0 | NA |
| 442 | 0 | NA |
| 450 | 0 | NA |
| 451 | 0 | NA |
| 453 | 0 | NA |
| 454 | 0 | NA |
| 455 | 0 | NA |
| 457 | 0 | NA |
| 461 | 0 | NA |
| 462 | 0 | NA |
| 465 | 0 | NA |
| 470 | 0 | NA |
| 471 | 0 | NA |
| 473 | 0 | NA |
| 475 | 0 | NA |
| 476 | 0 | NA |
| 480 | 0 | NA |
| 484 | 0 | NA |
| 485 | 0 | NA |
| 487 | 0 | NA |
| 490 | 0 | NA |
| 493 | 0 | NA |
| 495 | 0 | NA |
| 496 | 0 | NA |
| 498 | 0 | NA |
| 499 | 0 | NA |
| 500 | 0 | NA |
| 501 | 0 | NA |
| 502 | 0 | NA |
| 503 | 0 | NA |
| 506 | 0 | NA |
| 508 | 0 | NA |
| 510 | 0 | NA |
| 513 | 0 | NA |
| 514 | 0 | NA |
| 515 | 0 | NA |
| 519 | 0 | NA |
| 521 | 0 | NA |
| 527 | 0 | NA |
| 529 | 0 | NA |
| 532 | 0 | NA |
| 534 | 0 | NA |
| 536 | 0 | NA |
| 537 | 0 | NA |
| 538 | 0 | NA |
| 544 | 0 | NA |
| 546 | 0 | NA |
| 548 | 0 | NA |
| 549 | 0 | NA |
| 551 | 0 | NA |
| 552 | 0 | NA |
| 554 | 0 | NA |
| 558 | 0 | NA |
| 566 | 0 | NA |
| 568 | 0 | NA |
| 571 | 0 | NA |
| 572 | 0 | NA |
| 573 | 0 | NA |
| 575 | 0 | NA |
| 576 | 0 | NA |
| 579 | 0 | NA |
| 580 | 0 | NA |
| 583 | 0 | NA |
| 585 | 0 | NA |
| 587 | 0 | NA |
| 588 | 0 | NA |
| 590 | 0 | NA |
| 591 | 0 | NA |
| 593 | 0 | NA |
| 594 | 0 | NA |
| 595 | 0 | NA |
| 597 | 0 | NA |
| 604 | 0 | NA |
| 607 | 0 | NA |
| 608 | 0 | NA |
| 610 | 0 | NA |
| 614 | 0 | NA |
| 615 | 0 | NA |
| 616 | 0 | NA |
| 618 | 0 | NA |
| 621 | 0 | NA |
| 623 | 0 | NA |
| 626 | 0 | NA |
| 630 | 0 | NA |
| 631 | 0 | NA |
| 634 | 0 | NA |
| 635 | 0 | NA |
| 636 | 0 | NA |
| 638 | 0 | NA |
| 639 | 0 | NA |
| 644 | 0 | NA |
| 646 | 0 | NA |
| 647 | 0 | NA |
| 650 | 0 | NA |
| 651 | 0 | NA |
| 654 | 0 | NA |
| 656 | 0 | NA |
| 657 | 0 | NA |
| 660 | 0 | NA |
| 664 | 0 | NA |
| 671 | 0 | NA |
| 672 | 0 | NA |
| 677 | 0 | NA |
| 678 | 0 | NA |
| 680 | 0 | NA |
| 682 | 0 | NA |
| 685 | 0 | NA |
| 687 | 0 | NA |
| 688 | 0 | NA |
| 694 | 0 | NA |
| 695 | 0 | NA |
| 697 | 0 | NA |
| 698 | 0 | NA |
| 700 | 0 | NA |
| 702 | 0 | NA |
| 705 | 0 | NA |
| 706 | 0 | NA |
| 710 | 0 | NA |
| 711 | 0 | NA |
| 712 | 0 | NA |
| 714 | 0 | NA |
| 715 | 0 | NA |
| 717 | 0 | NA |
| 722 | 0 | NA |
| 723 | 0 | NA |
| 724 | 0 | NA |
| 726 | 0 | NA |
| 728 | 0 | NA |
| 729 | 0 | NA |
| 730 | 0 | NA |
| 735 | 0 | NA |
| 737 | 0 | NA |
| 739 | 0 | NA |
| 740 | 0 | NA |
| 741 | 0 | NA |
| 742 | 0 | NA |
| 743 | 0 | NA |
| 750 | 0 | NA |
| 751 | 0 | NA |
| 752 | 0 | NA |
| 753 | 0 | NA |
| 755 | 0 | NA |
| 756 | 0 | NA |
| 757 | 0 | NA |
| 759 | 0 | NA |
| 761 | 0 | NA |
| 762 | 0 | NA |
| 763 | 0 | NA |
| 768 | 0 | NA |
| 771 | 0 | NA |
| 772 | 0 | NA |
| 774 | 0 | NA |
| 776 | 0 | NA |
| 777 | 0 | NA |
| 780 | 0 | NA |
| 782 | 0 | NA |
| 783 | 0 | NA |
| 787 | 0 | NA |
| 788 | 0 | NA |
| 789 | 0 | NA |
| 790 | 0 | NA |
| 794 | 0 | NA |
| 798 | 0 | NA |
| 801 | 0 | NA |
| 802 | 0 | NA |
| 805 | 0 | NA |
| 806 | 0 | NA |
| 807 | 0 | NA |
| 810 | 0 | NA |
| 813 | 0 | NA |
| 814 | 0 | NA |
| 815 | 0 | NA |
| 816 | 0 | NA |
| 819 | 0 | NA |
| 821 | 0 | NA |
| 822 | 0 | NA |
| 827 | 0 | NA |
| 829 | 0 | NA |
| 830 | 0 | NA |
| 832 | 0 | NA |
| 834 | 0 | NA |
| 835 | 0 | NA |
| 836 | 0 | NA |
| 837 | 0 | NA |
| 838 | 0 | NA |
| 839 | 0 | NA |
| 840 | 0 | NA |
| 841 | 0 | NA |
| 843 | 0 | NA |
| 844 | 0 | NA |
| 851 | 0 | NA |
| 856 | 0 | NA |
| 857 | 0 | NA |
| 862 | 0 | NA |
| 863 | 0 | NA |
| 864 | 0 | NA |
| 865 | 0 | NA |
| 866 | 0 | NA |
| 868 | 0 | NA |
| 871 | 0 | NA |
| 872 | 0 | NA |
| 873 | 0 | NA |
| 875 | 0 | NA |
| 877 | 0 | NA |
| 879 | 0 | NA |
| 880 | 0 | NA |
| 881 | 0 | NA |
| 884 | 0 | NA |
| 885 | 0 | NA |
| 887 | 0 | NA |
| 888 | 0 | NA |
| 893 | 0 | NA |
| 895 | 0 | NA |
| 897 | 0 | NA |
| 898 | 0 | NA |
| 901 | 0 | NA |
| 902 | 0 | NA |
| 905 | 0 | NA |
| 906 | 0 | NA |
| 909 | 0 | NA |
| 911 | 0 | NA |
| 912 | 0 | NA |
| 913 | 0 | NA |
| 914 | 0 | NA |
| 915 | 0 | NA |
| 916 | 0 | NA |
| 917 | 0 | NA |
| 918 | 0 | NA |
| 921 | 0 | NA |
| 922 | 0 | NA |
| 926 | 0 | NA |
| 931 | 0 | NA |
| 932 | 0 | NA |
| 934 | 0 | NA |
| 936 | 0 | NA |
| 937 | 0 | NA |
| 939 | 0 | NA |
| 941 | 0 | NA |
| 943 | 0 | NA |
| 944 | 0 | NA |
| 946 | 0 | NA |
| 951 | 0 | NA |
| 952 | 0 | NA |
| 953 | 0 | NA |
| 955 | 0 | NA |
| 956 | 0 | NA |
| 958 | 0 | NA |
| 959 | 0 | NA |
| 960 | 0 | NA |
| 961 | 0 | NA |
| 964 | 0 | NA |
| 968 | 0 | NA |
| 969 | 0 | NA |
| 970 | 0 | NA |
| 971 | 0 | NA |
| 972 | 0 | NA |
| 974 | 0 | NA |
| 975 | 0 | NA |
| 976 | 0 | NA |
| 977 | 0 | NA |
| 978 | 0 | NA |
| 979 | 0 | NA |
| 980 | 0 | NA |
| 981 | 0 | NA |
| 985 | 0 | NA |
| 986 | 0 | NA |
| 987 | 0 | NA |
| 990 | 0 | NA |
| 994 | 0 | NA |
| 996 | 0 | NA |
| 997 | 0 | NA |
| 1000 | 0 | NA |
| 1001 | 0 | NA |
| 1002 | 0 | NA |
| 1004 | 0 | NA |
| 1006 | 0 | NA |
| 1008 | 0 | NA |
| 1010 | 0 | NA |
| 1012 | 0 | NA |
| 1014 | 0 | NA |
| 1016 | 0 | NA |
| 1021 | 0 | NA |
| 1022 | 0 | NA |
| 1023 | 0 | NA |
| 1026 | 0 | NA |
| 1030 | 0 | NA |
| 1031 | 0 | NA |
| 1034 | 0 | NA |
| 1036 | 0 | NA |
| 1040 | 0 | NA |
| 1042 | 0 | NA |
| 1048 | 0 | NA |
| 1050 | 0 | NA |
| 1051 | 0 | NA |
| 1061 | 0 | NA |
| 1062 | 0 | NA |
| 1063 | 0 | NA |
| 1066 | 0 | NA |
| 1068 | 0 | NA |
| 1070 | 0 | NA |
| 1071 | 0 | NA |
| 1072 | 0 | NA |
| 1074 | 0 | NA |
| 1075 | 0 | NA |
| 1078 | 0 | NA |
| 1080 | 0 | NA |
| 1081 | 0 | NA |
| 1082 | 0 | NA |
| 1086 | 0 | NA |
| 1087 | 0 | NA |
| 1089 | 0 | NA |
| 1091 | 0 | NA |
| 1092 | 0 | NA |
| 1094 | 0 | NA |
| 1095 | 0 | NA |
| 1097 | 0 | NA |
| 1098 | 0 | NA |
| 1099 | 0 | NA |
| 1101 | 0 | NA |
| 1102 | 0 | NA |
| 1103 | 0 | NA |
| 1105 | 0 | NA |
| 1113 | 0 | NA |
| 1114 | 0 | NA |
| 1115 | 0 | NA |
| 1118 | 0 | NA |
| 1119 | 0 | NA |
| 1120 | 0 | NA |
| 1122 | 0 | NA |
| 1123 | 0 | NA |
| 1124 | 0 | NA |
| 1130 | 0 | NA |
| 1132 | 0 | NA |
| 1133 | 0 | NA |
| 1137 | 0 | NA |
| 1138 | 0 | NA |
| 1141 | 0 | NA |
| 1144 | 0 | NA |
| 1145 | 0 | NA |
| 1150 | 0 | NA |
| 1154 | 0 | NA |
| 1159 | 0 | NA |
| 1164 | 0 | NA |
| 1166 | 0 | NA |
| 1167 | 0 | NA |
| 1173 | 0 | NA |
| 1177 | 0 | NA |
| 1178 | 0 | NA |
| 1181 | 0 | NA |
| 1186 | 0 | NA |
| 1187 | 0 | NA |
| 1189 | 0 | NA |
| 1192 | 0 | NA |
| 1193 | 0 | NA |
| 1194 | 0 | NA |
| 1195 | 0 | NA |
| 1196 | 0 | NA |
| 1199 | 0 | NA |
| 1201 | 0 | NA |
| 1202 | 0 | NA |
| 1205 | 0 | NA |
| 1207 | 0 | NA |
| 1208 | 0 | NA |
| 1209 | 0 | NA |
| 1212 | 0 | NA |
| 1213 | 0 | NA |
| 1214 | 0 | NA |
| 1215 | 0 | NA |
| 1216 | 0 | NA |
| 1217 | 0 | NA |
| 1218 | 0 | NA |
| 1219 | 0 | NA |
| 1220 | 0 | NA |
| 1221 | 0 | NA |
| 1224 | 0 | NA |
| 1226 | 0 | NA |
| 1228 | 0 | NA |
| 1230 | 0 | NA |
| 1232 | 0 | NA |
| 1233 | 0 | NA |
| 1234 | 0 | NA |
| 1235 | 0 | NA |
| 1236 | 0 | NA |
| 1237 | 0 | NA |
| 1239 | 0 | NA |
| 1241 | 0 | NA |
| 1242 | 0 | NA |
| 1247 | 0 | NA |
| 1248 | 0 | NA |
| 1250 | 0 | NA |
| 1253 | 0 | NA |
| 1258 | 0 | NA |
| 1259 | 0 | NA |
| 1260 | 0 | NA |
| 1262 | 0 | NA |
| 1265 | 0 | NA |
| 1266 | 0 | NA |
| 1270 | 0 | NA |
| 1273 | 0 | NA |
| 1276 | 0 | NA |
| 1280 | 0 | NA |
| 1281 | 0 | NA |
| 1283 | 0 | NA |
| 1284 | 0 | NA |
| 1288 | 0 | NA |
| 1292 | 0 | NA |
| 1293 | 0 | NA |
| 1295 | 0 | NA |
| 1296 | 0 | NA |
| 1297 | 0 | NA |
| 1298 | 0 | NA |
| 1300 | 0 | NA |
| 1304 | 0 | NA |
| 1305 | 0 | NA |
| 1308 | 0 | NA |
| 1309 | 0 | NA |
| 1310 | 0 | NA |
| 1312 | 0 | NA |
| 1320 | 0 | NA |
| 1322 | 0 | NA |
| 1324 | 0 | NA |
| 1326 | 0 | NA |
| 1327 | 0 | NA |
| 1332 | 0 | NA |
| 1334 | 0 | NA |
| 1335 | 0 | NA |
| 1337 | 0 | NA |
| 1338 | 0 | NA |
| 1339 | 0 | NA |
| 1340 | 0 | NA |
| 1341 | 0 | NA |
| 1342 | 0 | NA |
| 1346 | 0 | NA |
| 1350 | 0 | NA |
| 1351 | 0 | NA |
| 1353 | 0 | NA |
| 1357 | 0 | NA |
| 1359 | 0 | NA |
| 1361 | 0 | NA |
| 1365 | 0 | NA |
| 1366 | 0 | NA |
| 1367 | 0 | NA |
| 1369 | 0 | NA |
| 1371 | 0 | NA |
| 1377 | 0 | NA |
| 1378 | 0 | NA |
| 1379 | 0 | NA |
| 1380 | 0 | NA |
| 1381 | 0 | NA |
| 1383 | 0 | NA |
| 1384 | 0 | NA |
| 1385 | 0 | NA |
| 1386 | 0 | NA |
| 1391 | 0 | NA |
| 1392 | 0 | NA |
| 1398 | 0 | NA |
| 1399 | 0 | NA |
| 1404 | 0 | NA |
| 1405 | 0 | NA |
| 1407 | 0 | NA |
| 1408 | 0 | NA |
| 1409 | 0 | NA |
| 1411 | 0 | NA |
| 1412 | 0 | NA |
| 1413 | 0 | NA |
| 1417 | 0 | NA |
| 1419 | 0 | NA |
| 1423 | 0 | NA |
| 1426 | 0 | NA |
| 1432 | 0 | NA |
| 1433 | 0 | NA |
| 1437 | 0 | NA |
| 1439 | 0 | NA |
| 1445 | 0 | NA |
| 1446 | 0 | NA |
| 1447 | 0 | NA |
| 1449 | 0 | NA |
| 1450 | 0 | NA |
| 1451 | 0 | NA |
| 1453 | 0 | NA |
| 1454 | 0 | NA |
| 1455 | 0 | NA |
| 1459 | 0 | NA |
| 1460 | 0 | NA |
Garagem
Similarmente, temos para se a casa possui ou não garagem todas as
condçoes a seguir devem ocorrer:
- GarageType = NA - GarageYrBlt = NA - GarageFinish = NA - GarageCars =
0 - GarageArea = 0 - GarageQual = NA - GarageCond = NA
ds2_all[ds2_all$table == "train",] %>%
dplyr::select("Id" | contains("Garage")) %>%
filter_all(any_vars( is.na(.))) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond |
|---|---|---|---|---|---|---|---|
| 40 | NA | NA | NA | 0 | 0 | NA | NA |
| 49 | NA | NA | NA | 0 | 0 | NA | NA |
| 79 | NA | NA | NA | 0 | 0 | NA | NA |
| 89 | NA | NA | NA | 0 | 0 | NA | NA |
| 90 | NA | NA | NA | 0 | 0 | NA | NA |
| 100 | NA | NA | NA | 0 | 0 | NA | NA |
| 109 | NA | NA | NA | 0 | 0 | NA | NA |
| 126 | NA | NA | NA | 0 | 0 | NA | NA |
| 128 | NA | NA | NA | 0 | 0 | NA | NA |
| 141 | NA | NA | NA | 0 | 0 | NA | NA |
| 149 | NA | NA | NA | 0 | 0 | NA | NA |
| 156 | NA | NA | NA | 0 | 0 | NA | NA |
| 164 | NA | NA | NA | 0 | 0 | NA | NA |
| 166 | NA | NA | NA | 0 | 0 | NA | NA |
| 199 | NA | NA | NA | 0 | 0 | NA | NA |
| 211 | NA | NA | NA | 0 | 0 | NA | NA |
| 242 | NA | NA | NA | 0 | 0 | NA | NA |
| 251 | NA | NA | NA | 0 | 0 | NA | NA |
| 288 | NA | NA | NA | 0 | 0 | NA | NA |
| 292 | NA | NA | NA | 0 | 0 | NA | NA |
| 308 | NA | NA | NA | 0 | 0 | NA | NA |
| 376 | NA | NA | NA | 0 | 0 | NA | NA |
| 387 | NA | NA | NA | 0 | 0 | NA | NA |
| 394 | NA | NA | NA | 0 | 0 | NA | NA |
| 432 | NA | NA | NA | 0 | 0 | NA | NA |
| 435 | NA | NA | NA | 0 | 0 | NA | NA |
| 442 | NA | NA | NA | 0 | 0 | NA | NA |
| 465 | NA | NA | NA | 0 | 0 | NA | NA |
| 496 | NA | NA | NA | 0 | 0 | NA | NA |
| 521 | NA | NA | NA | 0 | 0 | NA | NA |
| 529 | NA | NA | NA | 0 | 0 | NA | NA |
| 534 | NA | NA | NA | 0 | 0 | NA | NA |
| 536 | NA | NA | NA | 0 | 0 | NA | NA |
| 563 | NA | NA | NA | 0 | 0 | NA | NA |
| 583 | NA | NA | NA | 0 | 0 | NA | NA |
| 614 | NA | NA | NA | 0 | 0 | NA | NA |
| 615 | NA | NA | NA | 0 | 0 | NA | NA |
| 621 | NA | NA | NA | 0 | 0 | NA | NA |
| 636 | NA | NA | NA | 0 | 0 | NA | NA |
| 637 | NA | NA | NA | 0 | 0 | NA | NA |
| 639 | NA | NA | NA | 0 | 0 | NA | NA |
| 650 | NA | NA | NA | 0 | 0 | NA | NA |
| 706 | NA | NA | NA | 0 | 0 | NA | NA |
| 711 | NA | NA | NA | 0 | 0 | NA | NA |
| 739 | NA | NA | NA | 0 | 0 | NA | NA |
| 751 | NA | NA | NA | 0 | 0 | NA | NA |
| 785 | NA | NA | NA | 0 | 0 | NA | NA |
| 827 | NA | NA | NA | 0 | 0 | NA | NA |
| 844 | NA | NA | NA | 0 | 0 | NA | NA |
| 922 | NA | NA | NA | 0 | 0 | NA | NA |
| 943 | NA | NA | NA | 0 | 0 | NA | NA |
| 955 | NA | NA | NA | 0 | 0 | NA | NA |
| 961 | NA | NA | NA | 0 | 0 | NA | NA |
| 969 | NA | NA | NA | 0 | 0 | NA | NA |
| 971 | NA | NA | NA | 0 | 0 | NA | NA |
| 977 | NA | NA | NA | 0 | 0 | NA | NA |
| 1010 | NA | NA | NA | 0 | 0 | NA | NA |
| 1012 | NA | NA | NA | 0 | 0 | NA | NA |
| 1031 | NA | NA | NA | 0 | 0 | NA | NA |
| 1039 | NA | NA | NA | 0 | 0 | NA | NA |
| 1097 | NA | NA | NA | 0 | 0 | NA | NA |
| 1124 | NA | NA | NA | 0 | 0 | NA | NA |
| 1132 | NA | NA | NA | 0 | 0 | NA | NA |
| 1138 | NA | NA | NA | 0 | 0 | NA | NA |
| 1144 | NA | NA | NA | 0 | 0 | NA | NA |
| 1174 | NA | NA | NA | 0 | 0 | NA | NA |
| 1180 | NA | NA | NA | 0 | 0 | NA | NA |
| 1219 | NA | NA | NA | 0 | 0 | NA | NA |
| 1220 | NA | NA | NA | 0 | 0 | NA | NA |
| 1235 | NA | NA | NA | 0 | 0 | NA | NA |
| 1258 | NA | NA | NA | 0 | 0 | NA | NA |
| 1284 | NA | NA | NA | 0 | 0 | NA | NA |
| 1324 | NA | NA | NA | 0 | 0 | NA | NA |
| 1326 | NA | NA | NA | 0 | 0 | NA | NA |
| 1327 | NA | NA | NA | 0 | 0 | NA | NA |
| 1338 | NA | NA | NA | 0 | 0 | NA | NA |
| 1350 | NA | NA | NA | 0 | 0 | NA | NA |
| 1408 | NA | NA | NA | 0 | 0 | NA | NA |
| 1450 | NA | NA | NA | 0 | 0 | NA | NA |
| 1451 | NA | NA | NA | 0 | 0 | NA | NA |
| 1454 | NA | NA | NA | 0 | 0 | NA | NA |
Piscina
Similarmente, temos para se a casa possui ou não piscina:
- PoolArea = 0 - PoolQC = NA
ambas as condições tem que ocorrer.
ds2_all[ds2_all$table == "train",] %>%
dplyr::select("Id" | contains("Pool")) %>%
filter_all(any_vars( is.na(.))) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | PoolArea | PoolQC |
|---|---|---|
| 1 | 0 | NA |
| 2 | 0 | NA |
| 3 | 0 | NA |
| 4 | 0 | NA |
| 5 | 0 | NA |
| 6 | 0 | NA |
| 7 | 0 | NA |
| 8 | 0 | NA |
| 9 | 0 | NA |
| 10 | 0 | NA |
| 11 | 0 | NA |
| 12 | 0 | NA |
| 13 | 0 | NA |
| 14 | 0 | NA |
| 15 | 0 | NA |
| 16 | 0 | NA |
| 17 | 0 | NA |
| 18 | 0 | NA |
| 19 | 0 | NA |
| 20 | 0 | NA |
| 21 | 0 | NA |
| 22 | 0 | NA |
| 23 | 0 | NA |
| 24 | 0 | NA |
| 25 | 0 | NA |
| 26 | 0 | NA |
| 27 | 0 | NA |
| 28 | 0 | NA |
| 29 | 0 | NA |
| 30 | 0 | NA |
| 31 | 0 | NA |
| 32 | 0 | NA |
| 33 | 0 | NA |
| 34 | 0 | NA |
| 35 | 0 | NA |
| 36 | 0 | NA |
| 37 | 0 | NA |
| 38 | 0 | NA |
| 39 | 0 | NA |
| 40 | 0 | NA |
| 41 | 0 | NA |
| 42 | 0 | NA |
| 43 | 0 | NA |
| 44 | 0 | NA |
| 45 | 0 | NA |
| 46 | 0 | NA |
| 47 | 0 | NA |
| 48 | 0 | NA |
| 49 | 0 | NA |
| 50 | 0 | NA |
| 51 | 0 | NA |
| 52 | 0 | NA |
| 53 | 0 | NA |
| 54 | 0 | NA |
| 55 | 0 | NA |
| 56 | 0 | NA |
| 57 | 0 | NA |
| 58 | 0 | NA |
| 59 | 0 | NA |
| 60 | 0 | NA |
| 61 | 0 | NA |
| 62 | 0 | NA |
| 63 | 0 | NA |
| 64 | 0 | NA |
| 65 | 0 | NA |
| 66 | 0 | NA |
| 67 | 0 | NA |
| 68 | 0 | NA |
| 69 | 0 | NA |
| 70 | 0 | NA |
| 71 | 0 | NA |
| 72 | 0 | NA |
| 73 | 0 | NA |
| 74 | 0 | NA |
| 75 | 0 | NA |
| 76 | 0 | NA |
| 77 | 0 | NA |
| 78 | 0 | NA |
| 79 | 0 | NA |
| 80 | 0 | NA |
| 81 | 0 | NA |
| 82 | 0 | NA |
| 83 | 0 | NA |
| 84 | 0 | NA |
| 85 | 0 | NA |
| 86 | 0 | NA |
| 87 | 0 | NA |
| 88 | 0 | NA |
| 89 | 0 | NA |
| 90 | 0 | NA |
| 91 | 0 | NA |
| 92 | 0 | NA |
| 93 | 0 | NA |
| 94 | 0 | NA |
| 95 | 0 | NA |
| 96 | 0 | NA |
| 97 | 0 | NA |
| 98 | 0 | NA |
| 99 | 0 | NA |
| 100 | 0 | NA |
| 101 | 0 | NA |
| 102 | 0 | NA |
| 103 | 0 | NA |
| 104 | 0 | NA |
| 105 | 0 | NA |
| 106 | 0 | NA |
| 107 | 0 | NA |
| 108 | 0 | NA |
| 109 | 0 | NA |
| 110 | 0 | NA |
| 111 | 0 | NA |
| 112 | 0 | NA |
| 113 | 0 | NA |
| 114 | 0 | NA |
| 115 | 0 | NA |
| 116 | 0 | NA |
| 117 | 0 | NA |
| 118 | 0 | NA |
| 119 | 0 | NA |
| 120 | 0 | NA |
| 121 | 0 | NA |
| 122 | 0 | NA |
| 123 | 0 | NA |
| 124 | 0 | NA |
| 125 | 0 | NA |
| 126 | 0 | NA |
| 127 | 0 | NA |
| 128 | 0 | NA |
| 129 | 0 | NA |
| 130 | 0 | NA |
| 131 | 0 | NA |
| 132 | 0 | NA |
| 133 | 0 | NA |
| 134 | 0 | NA |
| 135 | 0 | NA |
| 136 | 0 | NA |
| 137 | 0 | NA |
| 138 | 0 | NA |
| 139 | 0 | NA |
| 140 | 0 | NA |
| 141 | 0 | NA |
| 142 | 0 | NA |
| 143 | 0 | NA |
| 144 | 0 | NA |
| 145 | 0 | NA |
| 146 | 0 | NA |
| 147 | 0 | NA |
| 148 | 0 | NA |
| 149 | 0 | NA |
| 150 | 0 | NA |
| 151 | 0 | NA |
| 152 | 0 | NA |
| 153 | 0 | NA |
| 154 | 0 | NA |
| 155 | 0 | NA |
| 156 | 0 | NA |
| 157 | 0 | NA |
| 158 | 0 | NA |
| 159 | 0 | NA |
| 160 | 0 | NA |
| 161 | 0 | NA |
| 162 | 0 | NA |
| 163 | 0 | NA |
| 164 | 0 | NA |
| 165 | 0 | NA |
| 166 | 0 | NA |
| 167 | 0 | NA |
| 168 | 0 | NA |
| 169 | 0 | NA |
| 170 | 0 | NA |
| 171 | 0 | NA |
| 172 | 0 | NA |
| 173 | 0 | NA |
| 174 | 0 | NA |
| 175 | 0 | NA |
| 176 | 0 | NA |
| 177 | 0 | NA |
| 178 | 0 | NA |
| 179 | 0 | NA |
| 180 | 0 | NA |
| 181 | 0 | NA |
| 182 | 0 | NA |
| 183 | 0 | NA |
| 184 | 0 | NA |
| 185 | 0 | NA |
| 186 | 0 | NA |
| 187 | 0 | NA |
| 188 | 0 | NA |
| 189 | 0 | NA |
| 190 | 0 | NA |
| 191 | 0 | NA |
| 192 | 0 | NA |
| 193 | 0 | NA |
| 194 | 0 | NA |
| 195 | 0 | NA |
| 196 | 0 | NA |
| 197 | 0 | NA |
| 199 | 0 | NA |
| 200 | 0 | NA |
| 201 | 0 | NA |
| 202 | 0 | NA |
| 203 | 0 | NA |
| 204 | 0 | NA |
| 205 | 0 | NA |
| 206 | 0 | NA |
| 207 | 0 | NA |
| 208 | 0 | NA |
| 209 | 0 | NA |
| 210 | 0 | NA |
| 211 | 0 | NA |
| 212 | 0 | NA |
| 213 | 0 | NA |
| 214 | 0 | NA |
| 215 | 0 | NA |
| 216 | 0 | NA |
| 217 | 0 | NA |
| 218 | 0 | NA |
| 219 | 0 | NA |
| 220 | 0 | NA |
| 221 | 0 | NA |
| 222 | 0 | NA |
| 223 | 0 | NA |
| 224 | 0 | NA |
| 225 | 0 | NA |
| 226 | 0 | NA |
| 227 | 0 | NA |
| 228 | 0 | NA |
| 229 | 0 | NA |
| 230 | 0 | NA |
| 231 | 0 | NA |
| 232 | 0 | NA |
| 233 | 0 | NA |
| 234 | 0 | NA |
| 235 | 0 | NA |
| 236 | 0 | NA |
| 237 | 0 | NA |
| 238 | 0 | NA |
| 239 | 0 | NA |
| 240 | 0 | NA |
| 241 | 0 | NA |
| 242 | 0 | NA |
| 243 | 0 | NA |
| 244 | 0 | NA |
| 245 | 0 | NA |
| 246 | 0 | NA |
| 247 | 0 | NA |
| 248 | 0 | NA |
| 249 | 0 | NA |
| 250 | 0 | NA |
| 251 | 0 | NA |
| 252 | 0 | NA |
| 253 | 0 | NA |
| 254 | 0 | NA |
| 255 | 0 | NA |
| 256 | 0 | NA |
| 257 | 0 | NA |
| 258 | 0 | NA |
| 259 | 0 | NA |
| 260 | 0 | NA |
| 261 | 0 | NA |
| 262 | 0 | NA |
| 263 | 0 | NA |
| 264 | 0 | NA |
| 265 | 0 | NA |
| 266 | 0 | NA |
| 267 | 0 | NA |
| 268 | 0 | NA |
| 269 | 0 | NA |
| 270 | 0 | NA |
| 271 | 0 | NA |
| 272 | 0 | NA |
| 273 | 0 | NA |
| 274 | 0 | NA |
| 275 | 0 | NA |
| 276 | 0 | NA |
| 277 | 0 | NA |
| 278 | 0 | NA |
| 279 | 0 | NA |
| 280 | 0 | NA |
| 281 | 0 | NA |
| 282 | 0 | NA |
| 283 | 0 | NA |
| 284 | 0 | NA |
| 285 | 0 | NA |
| 286 | 0 | NA |
| 287 | 0 | NA |
| 288 | 0 | NA |
| 289 | 0 | NA |
| 290 | 0 | NA |
| 291 | 0 | NA |
| 292 | 0 | NA |
| 293 | 0 | NA |
| 294 | 0 | NA |
| 295 | 0 | NA |
| 296 | 0 | NA |
| 297 | 0 | NA |
| 298 | 0 | NA |
| 299 | 0 | NA |
| 300 | 0 | NA |
| 301 | 0 | NA |
| 302 | 0 | NA |
| 303 | 0 | NA |
| 304 | 0 | NA |
| 305 | 0 | NA |
| 306 | 0 | NA |
| 307 | 0 | NA |
| 308 | 0 | NA |
| 309 | 0 | NA |
| 310 | 0 | NA |
| 311 | 0 | NA |
| 312 | 0 | NA |
| 313 | 0 | NA |
| 314 | 0 | NA |
| 315 | 0 | NA |
| 316 | 0 | NA |
| 317 | 0 | NA |
| 318 | 0 | NA |
| 319 | 0 | NA |
| 320 | 0 | NA |
| 321 | 0 | NA |
| 322 | 0 | NA |
| 323 | 0 | NA |
| 324 | 0 | NA |
| 325 | 0 | NA |
| 326 | 0 | NA |
| 327 | 0 | NA |
| 328 | 0 | NA |
| 329 | 0 | NA |
| 330 | 0 | NA |
| 331 | 0 | NA |
| 332 | 0 | NA |
| 333 | 0 | NA |
| 334 | 0 | NA |
| 335 | 0 | NA |
| 336 | 0 | NA |
| 337 | 0 | NA |
| 338 | 0 | NA |
| 339 | 0 | NA |
| 340 | 0 | NA |
| 341 | 0 | NA |
| 342 | 0 | NA |
| 343 | 0 | NA |
| 344 | 0 | NA |
| 345 | 0 | NA |
| 346 | 0 | NA |
| 347 | 0 | NA |
| 348 | 0 | NA |
| 349 | 0 | NA |
| 350 | 0 | NA |
| 351 | 0 | NA |
| 352 | 0 | NA |
| 353 | 0 | NA |
| 354 | 0 | NA |
| 355 | 0 | NA |
| 356 | 0 | NA |
| 357 | 0 | NA |
| 358 | 0 | NA |
| 359 | 0 | NA |
| 360 | 0 | NA |
| 361 | 0 | NA |
| 362 | 0 | NA |
| 363 | 0 | NA |
| 364 | 0 | NA |
| 365 | 0 | NA |
| 366 | 0 | NA |
| 367 | 0 | NA |
| 368 | 0 | NA |
| 369 | 0 | NA |
| 370 | 0 | NA |
| 371 | 0 | NA |
| 372 | 0 | NA |
| 373 | 0 | NA |
| 374 | 0 | NA |
| 375 | 0 | NA |
| 376 | 0 | NA |
| 377 | 0 | NA |
| 378 | 0 | NA |
| 379 | 0 | NA |
| 380 | 0 | NA |
| 381 | 0 | NA |
| 382 | 0 | NA |
| 383 | 0 | NA |
| 384 | 0 | NA |
| 385 | 0 | NA |
| 386 | 0 | NA |
| 387 | 0 | NA |
| 388 | 0 | NA |
| 389 | 0 | NA |
| 390 | 0 | NA |
| 391 | 0 | NA |
| 392 | 0 | NA |
| 393 | 0 | NA |
| 394 | 0 | NA |
| 395 | 0 | NA |
| 396 | 0 | NA |
| 397 | 0 | NA |
| 398 | 0 | NA |
| 399 | 0 | NA |
| 400 | 0 | NA |
| 401 | 0 | NA |
| 402 | 0 | NA |
| 403 | 0 | NA |
| 404 | 0 | NA |
| 405 | 0 | NA |
| 406 | 0 | NA |
| 407 | 0 | NA |
| 408 | 0 | NA |
| 409 | 0 | NA |
| 410 | 0 | NA |
| 411 | 0 | NA |
| 412 | 0 | NA |
| 413 | 0 | NA |
| 414 | 0 | NA |
| 415 | 0 | NA |
| 416 | 0 | NA |
| 417 | 0 | NA |
| 418 | 0 | NA |
| 419 | 0 | NA |
| 420 | 0 | NA |
| 421 | 0 | NA |
| 422 | 0 | NA |
| 423 | 0 | NA |
| 424 | 0 | NA |
| 425 | 0 | NA |
| 426 | 0 | NA |
| 427 | 0 | NA |
| 428 | 0 | NA |
| 429 | 0 | NA |
| 430 | 0 | NA |
| 431 | 0 | NA |
| 432 | 0 | NA |
| 433 | 0 | NA |
| 434 | 0 | NA |
| 435 | 0 | NA |
| 436 | 0 | NA |
| 437 | 0 | NA |
| 438 | 0 | NA |
| 439 | 0 | NA |
| 440 | 0 | NA |
| 441 | 0 | NA |
| 442 | 0 | NA |
| 443 | 0 | NA |
| 444 | 0 | NA |
| 445 | 0 | NA |
| 446 | 0 | NA |
| 447 | 0 | NA |
| 448 | 0 | NA |
| 449 | 0 | NA |
| 450 | 0 | NA |
| 451 | 0 | NA |
| 452 | 0 | NA |
| 453 | 0 | NA |
| 454 | 0 | NA |
| 455 | 0 | NA |
| 456 | 0 | NA |
| 457 | 0 | NA |
| 458 | 0 | NA |
| 459 | 0 | NA |
| 460 | 0 | NA |
| 461 | 0 | NA |
| 462 | 0 | NA |
| 463 | 0 | NA |
| 464 | 0 | NA |
| 465 | 0 | NA |
| 466 | 0 | NA |
| 467 | 0 | NA |
| 468 | 0 | NA |
| 469 | 0 | NA |
| 470 | 0 | NA |
| 471 | 0 | NA |
| 472 | 0 | NA |
| 473 | 0 | NA |
| 474 | 0 | NA |
| 475 | 0 | NA |
| 476 | 0 | NA |
| 477 | 0 | NA |
| 478 | 0 | NA |
| 479 | 0 | NA |
| 480 | 0 | NA |
| 481 | 0 | NA |
| 482 | 0 | NA |
| 483 | 0 | NA |
| 484 | 0 | NA |
| 485 | 0 | NA |
| 486 | 0 | NA |
| 487 | 0 | NA |
| 488 | 0 | NA |
| 489 | 0 | NA |
| 490 | 0 | NA |
| 491 | 0 | NA |
| 492 | 0 | NA |
| 493 | 0 | NA |
| 494 | 0 | NA |
| 495 | 0 | NA |
| 496 | 0 | NA |
| 497 | 0 | NA |
| 498 | 0 | NA |
| 499 | 0 | NA |
| 500 | 0 | NA |
| 501 | 0 | NA |
| 502 | 0 | NA |
| 503 | 0 | NA |
| 504 | 0 | NA |
| 505 | 0 | NA |
| 506 | 0 | NA |
| 507 | 0 | NA |
| 508 | 0 | NA |
| 509 | 0 | NA |
| 510 | 0 | NA |
| 511 | 0 | NA |
| 512 | 0 | NA |
| 513 | 0 | NA |
| 514 | 0 | NA |
| 515 | 0 | NA |
| 516 | 0 | NA |
| 517 | 0 | NA |
| 518 | 0 | NA |
| 519 | 0 | NA |
| 520 | 0 | NA |
| 521 | 0 | NA |
| 522 | 0 | NA |
| 523 | 0 | NA |
| 524 | 0 | NA |
| 525 | 0 | NA |
| 526 | 0 | NA |
| 527 | 0 | NA |
| 528 | 0 | NA |
| 529 | 0 | NA |
| 530 | 0 | NA |
| 531 | 0 | NA |
| 532 | 0 | NA |
| 533 | 0 | NA |
| 534 | 0 | NA |
| 535 | 0 | NA |
| 536 | 0 | NA |
| 537 | 0 | NA |
| 538 | 0 | NA |
| 539 | 0 | NA |
| 540 | 0 | NA |
| 541 | 0 | NA |
| 542 | 0 | NA |
| 543 | 0 | NA |
| 544 | 0 | NA |
| 545 | 0 | NA |
| 546 | 0 | NA |
| 547 | 0 | NA |
| 548 | 0 | NA |
| 549 | 0 | NA |
| 550 | 0 | NA |
| 551 | 0 | NA |
| 552 | 0 | NA |
| 553 | 0 | NA |
| 554 | 0 | NA |
| 555 | 0 | NA |
| 556 | 0 | NA |
| 557 | 0 | NA |
| 558 | 0 | NA |
| 559 | 0 | NA |
| 560 | 0 | NA |
| 561 | 0 | NA |
| 562 | 0 | NA |
| 563 | 0 | NA |
| 564 | 0 | NA |
| 565 | 0 | NA |
| 566 | 0 | NA |
| 567 | 0 | NA |
| 568 | 0 | NA |
| 569 | 0 | NA |
| 570 | 0 | NA |
| 571 | 0 | NA |
| 572 | 0 | NA |
| 573 | 0 | NA |
| 574 | 0 | NA |
| 575 | 0 | NA |
| 576 | 0 | NA |
| 577 | 0 | NA |
| 578 | 0 | NA |
| 579 | 0 | NA |
| 580 | 0 | NA |
| 581 | 0 | NA |
| 582 | 0 | NA |
| 583 | 0 | NA |
| 584 | 0 | NA |
| 585 | 0 | NA |
| 586 | 0 | NA |
| 587 | 0 | NA |
| 588 | 0 | NA |
| 589 | 0 | NA |
| 590 | 0 | NA |
| 591 | 0 | NA |
| 592 | 0 | NA |
| 593 | 0 | NA |
| 594 | 0 | NA |
| 595 | 0 | NA |
| 596 | 0 | NA |
| 597 | 0 | NA |
| 598 | 0 | NA |
| 599 | 0 | NA |
| 600 | 0 | NA |
| 601 | 0 | NA |
| 602 | 0 | NA |
| 603 | 0 | NA |
| 604 | 0 | NA |
| 605 | 0 | NA |
| 606 | 0 | NA |
| 607 | 0 | NA |
| 608 | 0 | NA |
| 609 | 0 | NA |
| 610 | 0 | NA |
| 611 | 0 | NA |
| 612 | 0 | NA |
| 613 | 0 | NA |
| 614 | 0 | NA |
| 615 | 0 | NA |
| 616 | 0 | NA |
| 617 | 0 | NA |
| 618 | 0 | NA |
| 619 | 0 | NA |
| 620 | 0 | NA |
| 621 | 0 | NA |
| 622 | 0 | NA |
| 623 | 0 | NA |
| 624 | 0 | NA |
| 625 | 0 | NA |
| 626 | 0 | NA |
| 627 | 0 | NA |
| 628 | 0 | NA |
| 629 | 0 | NA |
| 630 | 0 | NA |
| 631 | 0 | NA |
| 632 | 0 | NA |
| 633 | 0 | NA |
| 634 | 0 | NA |
| 635 | 0 | NA |
| 636 | 0 | NA |
| 637 | 0 | NA |
| 638 | 0 | NA |
| 639 | 0 | NA |
| 640 | 0 | NA |
| 641 | 0 | NA |
| 642 | 0 | NA |
| 643 | 0 | NA |
| 644 | 0 | NA |
| 645 | 0 | NA |
| 646 | 0 | NA |
| 647 | 0 | NA |
| 648 | 0 | NA |
| 649 | 0 | NA |
| 650 | 0 | NA |
| 651 | 0 | NA |
| 652 | 0 | NA |
| 653 | 0 | NA |
| 654 | 0 | NA |
| 655 | 0 | NA |
| 656 | 0 | NA |
| 657 | 0 | NA |
| 658 | 0 | NA |
| 659 | 0 | NA |
| 660 | 0 | NA |
| 661 | 0 | NA |
| 662 | 0 | NA |
| 663 | 0 | NA |
| 664 | 0 | NA |
| 665 | 0 | NA |
| 666 | 0 | NA |
| 667 | 0 | NA |
| 668 | 0 | NA |
| 669 | 0 | NA |
| 670 | 0 | NA |
| 671 | 0 | NA |
| 672 | 0 | NA |
| 673 | 0 | NA |
| 674 | 0 | NA |
| 675 | 0 | NA |
| 676 | 0 | NA |
| 677 | 0 | NA |
| 678 | 0 | NA |
| 679 | 0 | NA |
| 680 | 0 | NA |
| 681 | 0 | NA |
| 682 | 0 | NA |
| 683 | 0 | NA |
| 684 | 0 | NA |
| 685 | 0 | NA |
| 686 | 0 | NA |
| 687 | 0 | NA |
| 688 | 0 | NA |
| 689 | 0 | NA |
| 690 | 0 | NA |
| 691 | 0 | NA |
| 692 | 0 | NA |
| 693 | 0 | NA |
| 694 | 0 | NA |
| 695 | 0 | NA |
| 696 | 0 | NA |
| 697 | 0 | NA |
| 698 | 0 | NA |
| 699 | 0 | NA |
| 700 | 0 | NA |
| 701 | 0 | NA |
| 702 | 0 | NA |
| 703 | 0 | NA |
| 704 | 0 | NA |
| 705 | 0 | NA |
| 706 | 0 | NA |
| 707 | 0 | NA |
| 708 | 0 | NA |
| 709 | 0 | NA |
| 710 | 0 | NA |
| 711 | 0 | NA |
| 712 | 0 | NA |
| 713 | 0 | NA |
| 714 | 0 | NA |
| 715 | 0 | NA |
| 716 | 0 | NA |
| 717 | 0 | NA |
| 718 | 0 | NA |
| 719 | 0 | NA |
| 720 | 0 | NA |
| 721 | 0 | NA |
| 722 | 0 | NA |
| 723 | 0 | NA |
| 724 | 0 | NA |
| 725 | 0 | NA |
| 726 | 0 | NA |
| 727 | 0 | NA |
| 728 | 0 | NA |
| 729 | 0 | NA |
| 730 | 0 | NA |
| 731 | 0 | NA |
| 732 | 0 | NA |
| 733 | 0 | NA |
| 734 | 0 | NA |
| 735 | 0 | NA |
| 736 | 0 | NA |
| 737 | 0 | NA |
| 738 | 0 | NA |
| 739 | 0 | NA |
| 740 | 0 | NA |
| 741 | 0 | NA |
| 742 | 0 | NA |
| 743 | 0 | NA |
| 744 | 0 | NA |
| 745 | 0 | NA |
| 746 | 0 | NA |
| 747 | 0 | NA |
| 748 | 0 | NA |
| 749 | 0 | NA |
| 750 | 0 | NA |
| 751 | 0 | NA |
| 752 | 0 | NA |
| 753 | 0 | NA |
| 754 | 0 | NA |
| 755 | 0 | NA |
| 756 | 0 | NA |
| 757 | 0 | NA |
| 758 | 0 | NA |
| 759 | 0 | NA |
| 760 | 0 | NA |
| 761 | 0 | NA |
| 762 | 0 | NA |
| 763 | 0 | NA |
| 764 | 0 | NA |
| 765 | 0 | NA |
| 766 | 0 | NA |
| 767 | 0 | NA |
| 768 | 0 | NA |
| 769 | 0 | NA |
| 770 | 0 | NA |
| 771 | 0 | NA |
| 772 | 0 | NA |
| 773 | 0 | NA |
| 774 | 0 | NA |
| 775 | 0 | NA |
| 776 | 0 | NA |
| 777 | 0 | NA |
| 778 | 0 | NA |
| 779 | 0 | NA |
| 780 | 0 | NA |
| 781 | 0 | NA |
| 782 | 0 | NA |
| 783 | 0 | NA |
| 784 | 0 | NA |
| 785 | 0 | NA |
| 786 | 0 | NA |
| 787 | 0 | NA |
| 788 | 0 | NA |
| 789 | 0 | NA |
| 790 | 0 | NA |
| 791 | 0 | NA |
| 792 | 0 | NA |
| 793 | 0 | NA |
| 794 | 0 | NA |
| 795 | 0 | NA |
| 796 | 0 | NA |
| 797 | 0 | NA |
| 798 | 0 | NA |
| 799 | 0 | NA |
| 800 | 0 | NA |
| 801 | 0 | NA |
| 802 | 0 | NA |
| 803 | 0 | NA |
| 804 | 0 | NA |
| 805 | 0 | NA |
| 806 | 0 | NA |
| 807 | 0 | NA |
| 808 | 0 | NA |
| 809 | 0 | NA |
| 810 | 0 | NA |
| 812 | 0 | NA |
| 813 | 0 | NA |
| 814 | 0 | NA |
| 815 | 0 | NA |
| 816 | 0 | NA |
| 817 | 0 | NA |
| 818 | 0 | NA |
| 819 | 0 | NA |
| 820 | 0 | NA |
| 821 | 0 | NA |
| 822 | 0 | NA |
| 823 | 0 | NA |
| 824 | 0 | NA |
| 825 | 0 | NA |
| 826 | 0 | NA |
| 827 | 0 | NA |
| 828 | 0 | NA |
| 829 | 0 | NA |
| 830 | 0 | NA |
| 831 | 0 | NA |
| 832 | 0 | NA |
| 833 | 0 | NA |
| 834 | 0 | NA |
| 835 | 0 | NA |
| 836 | 0 | NA |
| 837 | 0 | NA |
| 838 | 0 | NA |
| 839 | 0 | NA |
| 840 | 0 | NA |
| 841 | 0 | NA |
| 842 | 0 | NA |
| 843 | 0 | NA |
| 844 | 0 | NA |
| 845 | 0 | NA |
| 846 | 0 | NA |
| 847 | 0 | NA |
| 848 | 0 | NA |
| 849 | 0 | NA |
| 850 | 0 | NA |
| 851 | 0 | NA |
| 852 | 0 | NA |
| 853 | 0 | NA |
| 854 | 0 | NA |
| 855 | 0 | NA |
| 856 | 0 | NA |
| 857 | 0 | NA |
| 858 | 0 | NA |
| 859 | 0 | NA |
| 860 | 0 | NA |
| 861 | 0 | NA |
| 862 | 0 | NA |
| 863 | 0 | NA |
| 864 | 0 | NA |
| 865 | 0 | NA |
| 866 | 0 | NA |
| 867 | 0 | NA |
| 868 | 0 | NA |
| 869 | 0 | NA |
| 870 | 0 | NA |
| 871 | 0 | NA |
| 872 | 0 | NA |
| 873 | 0 | NA |
| 874 | 0 | NA |
| 875 | 0 | NA |
| 876 | 0 | NA |
| 877 | 0 | NA |
| 878 | 0 | NA |
| 879 | 0 | NA |
| 880 | 0 | NA |
| 881 | 0 | NA |
| 882 | 0 | NA |
| 883 | 0 | NA |
| 884 | 0 | NA |
| 885 | 0 | NA |
| 886 | 0 | NA |
| 887 | 0 | NA |
| 888 | 0 | NA |
| 889 | 0 | NA |
| 890 | 0 | NA |
| 891 | 0 | NA |
| 892 | 0 | NA |
| 893 | 0 | NA |
| 894 | 0 | NA |
| 895 | 0 | NA |
| 896 | 0 | NA |
| 897 | 0 | NA |
| 898 | 0 | NA |
| 899 | 0 | NA |
| 900 | 0 | NA |
| 901 | 0 | NA |
| 902 | 0 | NA |
| 903 | 0 | NA |
| 904 | 0 | NA |
| 905 | 0 | NA |
| 906 | 0 | NA |
| 907 | 0 | NA |
| 908 | 0 | NA |
| 909 | 0 | NA |
| 910 | 0 | NA |
| 911 | 0 | NA |
| 912 | 0 | NA |
| 913 | 0 | NA |
| 914 | 0 | NA |
| 915 | 0 | NA |
| 916 | 0 | NA |
| 917 | 0 | NA |
| 918 | 0 | NA |
| 919 | 0 | NA |
| 920 | 0 | NA |
| 921 | 0 | NA |
| 922 | 0 | NA |
| 923 | 0 | NA |
| 924 | 0 | NA |
| 925 | 0 | NA |
| 926 | 0 | NA |
| 927 | 0 | NA |
| 928 | 0 | NA |
| 929 | 0 | NA |
| 930 | 0 | NA |
| 931 | 0 | NA |
| 932 | 0 | NA |
| 933 | 0 | NA |
| 934 | 0 | NA |
| 935 | 0 | NA |
| 936 | 0 | NA |
| 937 | 0 | NA |
| 938 | 0 | NA |
| 939 | 0 | NA |
| 940 | 0 | NA |
| 941 | 0 | NA |
| 942 | 0 | NA |
| 943 | 0 | NA |
| 944 | 0 | NA |
| 945 | 0 | NA |
| 946 | 0 | NA |
| 947 | 0 | NA |
| 948 | 0 | NA |
| 949 | 0 | NA |
| 950 | 0 | NA |
| 951 | 0 | NA |
| 952 | 0 | NA |
| 953 | 0 | NA |
| 954 | 0 | NA |
| 955 | 0 | NA |
| 956 | 0 | NA |
| 957 | 0 | NA |
| 958 | 0 | NA |
| 959 | 0 | NA |
| 960 | 0 | NA |
| 961 | 0 | NA |
| 962 | 0 | NA |
| 963 | 0 | NA |
| 964 | 0 | NA |
| 965 | 0 | NA |
| 966 | 0 | NA |
| 967 | 0 | NA |
| 968 | 0 | NA |
| 969 | 0 | NA |
| 970 | 0 | NA |
| 971 | 0 | NA |
| 972 | 0 | NA |
| 973 | 0 | NA |
| 974 | 0 | NA |
| 975 | 0 | NA |
| 976 | 0 | NA |
| 977 | 0 | NA |
| 978 | 0 | NA |
| 979 | 0 | NA |
| 980 | 0 | NA |
| 981 | 0 | NA |
| 982 | 0 | NA |
| 983 | 0 | NA |
| 984 | 0 | NA |
| 985 | 0 | NA |
| 986 | 0 | NA |
| 987 | 0 | NA |
| 988 | 0 | NA |
| 989 | 0 | NA |
| 990 | 0 | NA |
| 991 | 0 | NA |
| 992 | 0 | NA |
| 993 | 0 | NA |
| 994 | 0 | NA |
| 995 | 0 | NA |
| 996 | 0 | NA |
| 997 | 0 | NA |
| 998 | 0 | NA |
| 999 | 0 | NA |
| 1000 | 0 | NA |
| 1001 | 0 | NA |
| 1002 | 0 | NA |
| 1003 | 0 | NA |
| 1004 | 0 | NA |
| 1005 | 0 | NA |
| 1006 | 0 | NA |
| 1007 | 0 | NA |
| 1008 | 0 | NA |
| 1009 | 0 | NA |
| 1010 | 0 | NA |
| 1011 | 0 | NA |
| 1012 | 0 | NA |
| 1013 | 0 | NA |
| 1014 | 0 | NA |
| 1015 | 0 | NA |
| 1016 | 0 | NA |
| 1017 | 0 | NA |
| 1018 | 0 | NA |
| 1019 | 0 | NA |
| 1020 | 0 | NA |
| 1021 | 0 | NA |
| 1022 | 0 | NA |
| 1023 | 0 | NA |
| 1024 | 0 | NA |
| 1025 | 0 | NA |
| 1026 | 0 | NA |
| 1027 | 0 | NA |
| 1028 | 0 | NA |
| 1029 | 0 | NA |
| 1030 | 0 | NA |
| 1031 | 0 | NA |
| 1032 | 0 | NA |
| 1033 | 0 | NA |
| 1034 | 0 | NA |
| 1035 | 0 | NA |
| 1036 | 0 | NA |
| 1037 | 0 | NA |
| 1038 | 0 | NA |
| 1039 | 0 | NA |
| 1040 | 0 | NA |
| 1041 | 0 | NA |
| 1042 | 0 | NA |
| 1043 | 0 | NA |
| 1044 | 0 | NA |
| 1045 | 0 | NA |
| 1046 | 0 | NA |
| 1047 | 0 | NA |
| 1048 | 0 | NA |
| 1049 | 0 | NA |
| 1050 | 0 | NA |
| 1051 | 0 | NA |
| 1052 | 0 | NA |
| 1053 | 0 | NA |
| 1054 | 0 | NA |
| 1055 | 0 | NA |
| 1056 | 0 | NA |
| 1057 | 0 | NA |
| 1058 | 0 | NA |
| 1059 | 0 | NA |
| 1060 | 0 | NA |
| 1061 | 0 | NA |
| 1062 | 0 | NA |
| 1063 | 0 | NA |
| 1064 | 0 | NA |
| 1065 | 0 | NA |
| 1066 | 0 | NA |
| 1067 | 0 | NA |
| 1068 | 0 | NA |
| 1069 | 0 | NA |
| 1070 | 0 | NA |
| 1071 | 0 | NA |
| 1072 | 0 | NA |
| 1073 | 0 | NA |
| 1074 | 0 | NA |
| 1075 | 0 | NA |
| 1076 | 0 | NA |
| 1077 | 0 | NA |
| 1078 | 0 | NA |
| 1079 | 0 | NA |
| 1080 | 0 | NA |
| 1081 | 0 | NA |
| 1082 | 0 | NA |
| 1083 | 0 | NA |
| 1084 | 0 | NA |
| 1085 | 0 | NA |
| 1086 | 0 | NA |
| 1087 | 0 | NA |
| 1088 | 0 | NA |
| 1089 | 0 | NA |
| 1090 | 0 | NA |
| 1091 | 0 | NA |
| 1092 | 0 | NA |
| 1093 | 0 | NA |
| 1094 | 0 | NA |
| 1095 | 0 | NA |
| 1096 | 0 | NA |
| 1097 | 0 | NA |
| 1098 | 0 | NA |
| 1099 | 0 | NA |
| 1100 | 0 | NA |
| 1101 | 0 | NA |
| 1102 | 0 | NA |
| 1103 | 0 | NA |
| 1104 | 0 | NA |
| 1105 | 0 | NA |
| 1106 | 0 | NA |
| 1107 | 0 | NA |
| 1108 | 0 | NA |
| 1109 | 0 | NA |
| 1110 | 0 | NA |
| 1111 | 0 | NA |
| 1112 | 0 | NA |
| 1113 | 0 | NA |
| 1114 | 0 | NA |
| 1115 | 0 | NA |
| 1116 | 0 | NA |
| 1117 | 0 | NA |
| 1118 | 0 | NA |
| 1119 | 0 | NA |
| 1120 | 0 | NA |
| 1121 | 0 | NA |
| 1122 | 0 | NA |
| 1123 | 0 | NA |
| 1124 | 0 | NA |
| 1125 | 0 | NA |
| 1126 | 0 | NA |
| 1127 | 0 | NA |
| 1128 | 0 | NA |
| 1129 | 0 | NA |
| 1130 | 0 | NA |
| 1131 | 0 | NA |
| 1132 | 0 | NA |
| 1133 | 0 | NA |
| 1134 | 0 | NA |
| 1135 | 0 | NA |
| 1136 | 0 | NA |
| 1137 | 0 | NA |
| 1138 | 0 | NA |
| 1139 | 0 | NA |
| 1140 | 0 | NA |
| 1141 | 0 | NA |
| 1142 | 0 | NA |
| 1143 | 0 | NA |
| 1144 | 0 | NA |
| 1145 | 0 | NA |
| 1146 | 0 | NA |
| 1147 | 0 | NA |
| 1148 | 0 | NA |
| 1149 | 0 | NA |
| 1150 | 0 | NA |
| 1151 | 0 | NA |
| 1152 | 0 | NA |
| 1153 | 0 | NA |
| 1154 | 0 | NA |
| 1155 | 0 | NA |
| 1156 | 0 | NA |
| 1157 | 0 | NA |
| 1158 | 0 | NA |
| 1159 | 0 | NA |
| 1160 | 0 | NA |
| 1161 | 0 | NA |
| 1162 | 0 | NA |
| 1163 | 0 | NA |
| 1164 | 0 | NA |
| 1165 | 0 | NA |
| 1166 | 0 | NA |
| 1167 | 0 | NA |
| 1168 | 0 | NA |
| 1169 | 0 | NA |
| 1170 | 0 | NA |
| 1172 | 0 | NA |
| 1173 | 0 | NA |
| 1174 | 0 | NA |
| 1175 | 0 | NA |
| 1176 | 0 | NA |
| 1177 | 0 | NA |
| 1178 | 0 | NA |
| 1179 | 0 | NA |
| 1180 | 0 | NA |
| 1181 | 0 | NA |
| 1182 | 0 | NA |
| 1184 | 0 | NA |
| 1185 | 0 | NA |
| 1186 | 0 | NA |
| 1187 | 0 | NA |
| 1188 | 0 | NA |
| 1189 | 0 | NA |
| 1190 | 0 | NA |
| 1191 | 0 | NA |
| 1192 | 0 | NA |
| 1193 | 0 | NA |
| 1194 | 0 | NA |
| 1195 | 0 | NA |
| 1196 | 0 | NA |
| 1197 | 0 | NA |
| 1198 | 0 | NA |
| 1199 | 0 | NA |
| 1200 | 0 | NA |
| 1201 | 0 | NA |
| 1202 | 0 | NA |
| 1203 | 0 | NA |
| 1204 | 0 | NA |
| 1205 | 0 | NA |
| 1206 | 0 | NA |
| 1207 | 0 | NA |
| 1208 | 0 | NA |
| 1209 | 0 | NA |
| 1210 | 0 | NA |
| 1211 | 0 | NA |
| 1212 | 0 | NA |
| 1213 | 0 | NA |
| 1214 | 0 | NA |
| 1215 | 0 | NA |
| 1216 | 0 | NA |
| 1217 | 0 | NA |
| 1218 | 0 | NA |
| 1219 | 0 | NA |
| 1220 | 0 | NA |
| 1221 | 0 | NA |
| 1222 | 0 | NA |
| 1223 | 0 | NA |
| 1224 | 0 | NA |
| 1225 | 0 | NA |
| 1226 | 0 | NA |
| 1227 | 0 | NA |
| 1228 | 0 | NA |
| 1229 | 0 | NA |
| 1230 | 0 | NA |
| 1231 | 0 | NA |
| 1232 | 0 | NA |
| 1233 | 0 | NA |
| 1234 | 0 | NA |
| 1235 | 0 | NA |
| 1236 | 0 | NA |
| 1237 | 0 | NA |
| 1238 | 0 | NA |
| 1239 | 0 | NA |
| 1240 | 0 | NA |
| 1241 | 0 | NA |
| 1242 | 0 | NA |
| 1243 | 0 | NA |
| 1244 | 0 | NA |
| 1245 | 0 | NA |
| 1246 | 0 | NA |
| 1247 | 0 | NA |
| 1248 | 0 | NA |
| 1249 | 0 | NA |
| 1250 | 0 | NA |
| 1251 | 0 | NA |
| 1252 | 0 | NA |
| 1253 | 0 | NA |
| 1254 | 0 | NA |
| 1255 | 0 | NA |
| 1256 | 0 | NA |
| 1257 | 0 | NA |
| 1258 | 0 | NA |
| 1259 | 0 | NA |
| 1260 | 0 | NA |
| 1261 | 0 | NA |
| 1262 | 0 | NA |
| 1263 | 0 | NA |
| 1264 | 0 | NA |
| 1265 | 0 | NA |
| 1266 | 0 | NA |
| 1267 | 0 | NA |
| 1268 | 0 | NA |
| 1269 | 0 | NA |
| 1270 | 0 | NA |
| 1271 | 0 | NA |
| 1272 | 0 | NA |
| 1273 | 0 | NA |
| 1274 | 0 | NA |
| 1275 | 0 | NA |
| 1276 | 0 | NA |
| 1277 | 0 | NA |
| 1278 | 0 | NA |
| 1279 | 0 | NA |
| 1280 | 0 | NA |
| 1281 | 0 | NA |
| 1282 | 0 | NA |
| 1283 | 0 | NA |
| 1284 | 0 | NA |
| 1285 | 0 | NA |
| 1286 | 0 | NA |
| 1287 | 0 | NA |
| 1288 | 0 | NA |
| 1289 | 0 | NA |
| 1290 | 0 | NA |
| 1291 | 0 | NA |
| 1292 | 0 | NA |
| 1293 | 0 | NA |
| 1294 | 0 | NA |
| 1295 | 0 | NA |
| 1296 | 0 | NA |
| 1297 | 0 | NA |
| 1298 | 0 | NA |
| 1300 | 0 | NA |
| 1301 | 0 | NA |
| 1302 | 0 | NA |
| 1303 | 0 | NA |
| 1304 | 0 | NA |
| 1305 | 0 | NA |
| 1306 | 0 | NA |
| 1307 | 0 | NA |
| 1308 | 0 | NA |
| 1309 | 0 | NA |
| 1310 | 0 | NA |
| 1311 | 0 | NA |
| 1312 | 0 | NA |
| 1313 | 0 | NA |
| 1314 | 0 | NA |
| 1315 | 0 | NA |
| 1316 | 0 | NA |
| 1317 | 0 | NA |
| 1318 | 0 | NA |
| 1319 | 0 | NA |
| 1320 | 0 | NA |
| 1321 | 0 | NA |
| 1322 | 0 | NA |
| 1323 | 0 | NA |
| 1324 | 0 | NA |
| 1325 | 0 | NA |
| 1326 | 0 | NA |
| 1327 | 0 | NA |
| 1328 | 0 | NA |
| 1329 | 0 | NA |
| 1330 | 0 | NA |
| 1331 | 0 | NA |
| 1332 | 0 | NA |
| 1333 | 0 | NA |
| 1334 | 0 | NA |
| 1335 | 0 | NA |
| 1336 | 0 | NA |
| 1337 | 0 | NA |
| 1338 | 0 | NA |
| 1339 | 0 | NA |
| 1340 | 0 | NA |
| 1341 | 0 | NA |
| 1342 | 0 | NA |
| 1343 | 0 | NA |
| 1344 | 0 | NA |
| 1345 | 0 | NA |
| 1346 | 0 | NA |
| 1347 | 0 | NA |
| 1348 | 0 | NA |
| 1349 | 0 | NA |
| 1350 | 0 | NA |
| 1351 | 0 | NA |
| 1352 | 0 | NA |
| 1353 | 0 | NA |
| 1354 | 0 | NA |
| 1355 | 0 | NA |
| 1356 | 0 | NA |
| 1357 | 0 | NA |
| 1358 | 0 | NA |
| 1359 | 0 | NA |
| 1360 | 0 | NA |
| 1361 | 0 | NA |
| 1362 | 0 | NA |
| 1363 | 0 | NA |
| 1364 | 0 | NA |
| 1365 | 0 | NA |
| 1366 | 0 | NA |
| 1367 | 0 | NA |
| 1368 | 0 | NA |
| 1369 | 0 | NA |
| 1370 | 0 | NA |
| 1371 | 0 | NA |
| 1372 | 0 | NA |
| 1373 | 0 | NA |
| 1374 | 0 | NA |
| 1375 | 0 | NA |
| 1376 | 0 | NA |
| 1377 | 0 | NA |
| 1378 | 0 | NA |
| 1379 | 0 | NA |
| 1380 | 0 | NA |
| 1381 | 0 | NA |
| 1382 | 0 | NA |
| 1383 | 0 | NA |
| 1384 | 0 | NA |
| 1385 | 0 | NA |
| 1386 | 0 | NA |
| 1388 | 0 | NA |
| 1389 | 0 | NA |
| 1390 | 0 | NA |
| 1391 | 0 | NA |
| 1392 | 0 | NA |
| 1393 | 0 | NA |
| 1394 | 0 | NA |
| 1395 | 0 | NA |
| 1396 | 0 | NA |
| 1397 | 0 | NA |
| 1398 | 0 | NA |
| 1399 | 0 | NA |
| 1400 | 0 | NA |
| 1401 | 0 | NA |
| 1402 | 0 | NA |
| 1403 | 0 | NA |
| 1404 | 0 | NA |
| 1405 | 0 | NA |
| 1406 | 0 | NA |
| 1407 | 0 | NA |
| 1408 | 0 | NA |
| 1409 | 0 | NA |
| 1410 | 0 | NA |
| 1411 | 0 | NA |
| 1412 | 0 | NA |
| 1413 | 0 | NA |
| 1414 | 0 | NA |
| 1415 | 0 | NA |
| 1416 | 0 | NA |
| 1417 | 0 | NA |
| 1418 | 0 | NA |
| 1419 | 0 | NA |
| 1420 | 0 | NA |
| 1421 | 0 | NA |
| 1422 | 0 | NA |
| 1423 | 0 | NA |
| 1425 | 0 | NA |
| 1426 | 0 | NA |
| 1427 | 0 | NA |
| 1428 | 0 | NA |
| 1429 | 0 | NA |
| 1430 | 0 | NA |
| 1431 | 0 | NA |
| 1432 | 0 | NA |
| 1433 | 0 | NA |
| 1434 | 0 | NA |
| 1435 | 0 | NA |
| 1436 | 0 | NA |
| 1437 | 0 | NA |
| 1438 | 0 | NA |
| 1439 | 0 | NA |
| 1440 | 0 | NA |
| 1441 | 0 | NA |
| 1442 | 0 | NA |
| 1443 | 0 | NA |
| 1444 | 0 | NA |
| 1445 | 0 | NA |
| 1446 | 0 | NA |
| 1447 | 0 | NA |
| 1448 | 0 | NA |
| 1449 | 0 | NA |
| 1450 | 0 | NA |
| 1451 | 0 | NA |
| 1452 | 0 | NA |
| 1453 | 0 | NA |
| 1454 | 0 | NA |
| 1455 | 0 | NA |
| 1456 | 0 | NA |
| 1457 | 0 | NA |
| 1458 | 0 | NA |
| 1459 | 0 | NA |
| 1460 | 0 | NA |
Ajuste e criação de Dummies
# mudando NA para 0 pensando no futuro, assumindo que o R vai jogar um dos fatores (geralmente em ordem alfabética) como base quando calculamos a regressão. E, no caso, NA para essa variável não é valor faltante, é uma resposta válidade.
# Alley
ds2_all$Alley_ <- ifelse(is.na(ds2_all$Alley), 0, ds2_all$Alley)
#ds2_all[c(22:32),c("Alley","Alley_")]
# BsmtQual / BsmtCond / BsmtExposure / BsmtFinType1 / BsmtFinType2
# existência de porão
ds2_all$BsmtDummy_ <- ifelse(is.na(ds2_all$BsmtQual) &
is.na(ds2_all$BsmtCond) &
is.na(ds2_all$BsmtExposure) &
is.na(ds2_all$BsmtFinType1) &
ds2_all$BsmtFinSF1 == 0 &
is.na(ds2_all$BsmtFinType2) &
ds2_all$BsmtFinSF2 == 0 &
ds2_all$BsmtUnfSF == 0 &
ds2_all$TotalBsmtSF == 0, 0, 1) #binary
ds2_all$BsmtQual_ <- ifelse(ds2_all$BsmtDummy_ == 0 & is.na(ds2_all$BsmtQual), 0, ds2_all$BsmtQual)
#ds2_all[c(15:25),c("BsmtQual","BsmtQual_")]
ds2_all$BsmtCond_ <- ifelse(ds2_all$BsmtDummy_ == 0 & is.na(ds2_all$BsmtCond), 0, ds2_all$BsmtCond)
ds2_all$BsmtExposure_ <- ifelse(ds2_all$BsmtDummy_ == 0 & is.na(ds2_all$BsmtExposure), 0, ds2_all$BsmtExposure)
ds2_all$BsmtFinType1_ <- ifelse(ds2_all$BsmtDummy_ == 0 & is.na(ds2_all$BsmtFinType1), 0, ds2_all$BsmtFinType1)
ds2_all$BsmtFinType2_ <- ifelse(ds2_all$BsmtDummy_ == 0 & is.na(ds2_all$BsmtFinType2), 0, ds2_all$BsmtFinType2)
# MasVnrTyp
# existência de alvenaria
ds2_all$MasVnrDummy_ <- ifelse(ds2_all$MasVnrType == "None" &
ds2_all$MasVnrArea == 0, 0, 1) #binary
ds2_all$MasVnrType_ <- ifelse(ds2_all$MasVnrType == "None" , 0, ds2_all$MasVnrType)
# FireplaceQu
# existência de lareira
ds2_all$FirePlaceDummy_ <- ifelse(ds2_all$Fireplaces == 0 &
is.na(ds2_all$FireplaceQu), 0, 1) #binary
ds2_all$FireplaceQu_ <- ifelse(ds2_all$FirePlaceDummy_ == 0 &
is.na(ds2_all$FireplaceQu), 0, ds2_all$FireplaceQu)
# GarageType / GarageFinish / GarageQual / GarageCond
# existência de garagem
ds2_all$GarageDummy_ <- ifelse(is.na(ds2_all$GarageType) &
is.na(ds2_all$GarageYrBlt) &
is.na(ds2_all$GarageFinish) &
ds2_all$GarageCars == 0 &
ds2_all$GarageArea == 0 &
is.na(ds2_all$GarageQual) &
is.na(ds2_all$GarageCond) , 0, 1) #binary
ds2_all$GarageType_ <- ifelse(ds2_all$GarageDummy_ == 0 & is.na(ds2_all$GarageType), 0, ds2_all$GarageType)
ds2_all$GarageFinish_ <- ifelse(ds2_all$GarageDummy_ == 0 & is.na(ds2_all$GarageFinish), 0, ds2_all$GarageFinish)
ds2_all$GarageQual_ <- ifelse(ds2_all$GarageDummy_ == 0 & is.na(ds2_all$GarageQual), 0, ds2_all$GarageQual)
ds2_all$GarageCond_ <- ifelse(ds2_all$GarageDummy_ == 0 & is.na(ds2_all$GarageCond), 0, ds2_all$GarageCond)
# PoolQC
# existência de piscina
ds2_all$PoolDummy_ <- ifelse(ds2_all$PoolArea == 0 & is.na(ds2_all$PoolQC), 0, 1) #binary
ds2_all$PoolQC_ <- ifelse(ds2_all$PoolDummy_ == 0 & is.na(ds2_all$PoolQC), 0, ds2_all$PoolQC)
# Fence
# existência de cerca
ds2_all$FenceDummy_ <- ifelse(is.na(ds2_all$Fence), 0, 1) #binary
ds2_all$Fence_ <- ifelse(is.na(ds2_all$Fence), 0, ds2_all$Fence)
# MiscFeature
# existência de coisas variadas
ds2_all$MiscFeatureDummy_ <- ifelse(is.na(ds2_all$MiscFeature), 0, 1) #binary
ds2_all$MiscFeature_ <- ifelse(is.na(ds2_all$MiscFeature), 0, ds2_all$MiscFeature)
# CentralAir
ds2_all <- ds2_all %>%
mutate(
CentralAir_ = as.factor(case_when(
CentralAir == "Y" ~ "Yes",
CentralAir == "N" ~ "No"
))
)Criação de fatores
# Transformando as variáveis numericas que são fatores em fatores.
# tipo de moradia
ds2_all$MSSubClass_ <- as.factor(ds2_all$MSSubClass)
### Likert 10
likert_10levels <- factor(c(1:10), ordered = TRUE)
# escala de Likert - OverallQual
ds2_all$OverallQual_ <- factor(ds2_all$OverallQual, levels = likert_10levels, ordered = TRUE)
# escala de Likert - OverallCond
ds2_all$OverallCond_ <- factor(ds2_all$OverallCond, levels = likert_10levels, ordered = TRUE)
### Likert 5
likert_5levels_a <- factor(c(1:5), labels = c("Po", "Fa", "TA", "Gd", "Ex") , ordered = TRUE)
likert_5levels_b <- factor(c(0:4), labels = c("0", "No", "Mn", "Av", "Gd") , ordered = TRUE)
# escala de Likert A - ExterQual
ds2_all$ExterQual_ <- factor(ds2_all$ExterQual, levels = likert_5levels_a, ordered = TRUE)
# escala de Likert A - ExterCond
ds2_all$ExterCond_ <- factor(ds2_all$ExterCond, levels = likert_5levels_a, ordered = TRUE)
# escala de Likert B - BsmtExposure_
ds2_all$BsmtExposure_ <- factor(ds2_all$BsmtExposure_, levels = likert_5levels_b, ordered = TRUE)
# escala de Likert A - HeatingQC
ds2_all$HeatingQC_ <- factor(ds2_all$HeatingQC, levels = likert_5levels_a, ordered = TRUE)
# escala de Likert A - KitchenQual
ds2_all$KitchenQual_ <- factor(ds2_all$KitchenQual, levels = likert_5levels_a, ordered = TRUE)
### Likert 6
likert_6levels <- factor(c(0:5), labels = c("0","Po", "Fa", "TA", "Gd", "Ex") , ordered = TRUE)
# escala de Likert - BsmtQual_
ds2_all$BsmtQual_ <- factor(ds2_all$BsmtQual_, levels = likert_6levels, ordered = TRUE)
# escala de Likert - BsmtCond_
ds2_all$BsmtCond_ <- factor(ds2_all$BsmtCond_, levels = likert_6levels, ordered = TRUE)
# escala de Likert - FireplaceQu_
ds2_all$FireplaceQu_ <- factor(ds2_all$FireplaceQu_, levels = likert_6levels, ordered = TRUE)
# escala de Likert - GarageQual_
ds2_all$GarageQual_ <- factor(ds2_all$GarageQual_, levels = likert_6levels, ordered = TRUE)
# escala de Likert - GarageCond_
ds2_all$GarageCond_ <- factor(ds2_all$GarageCond_, levels = likert_6levels, ordered = TRUE)
# escala de Likert - PoolQC_
ds2_all$PoolQC_ <- factor(ds2_all$PoolQC_, levels = likert_6levels, ordered = TRUE)
### Likert 7
likert_7levels <- factor(c(0:6), labels = c("0", "Unf", "LwQ", "Rec", "BLQ", "ALQ", "GLQ") , ordered = TRUE)
# escala de Likert - BsmtFinType1_
ds2_all$BsmtFinType1_ <- factor(ds2_all$BsmtFinType1_, levels = likert_7levels, ordered = TRUE)
# escala de Likert - BsmtFinType2_
ds2_all$BsmtFinType2_ <- factor(ds2_all$BsmtFinType2_, levels = likert_7levels, ordered = TRUE)Summary
Após feitas as mudanças, podemos ver que o número de valores faltantes reais nas variáveis categoricas diminutiam drasticamente quando comparado com as variáveis originais.
#h_train1_summary1 <- as.data.frame(which( colSums( is.na(ds2_all[ds2_all$table == "train",]) ) > 0 ))
#colnames(h_train1_summary1) <- c("Missing Observations")
h_train1_summary1 <- as.data.frame(t(ds2_all[ds2_all$table == "train",] %>%
summarise(across(everything(), ~ sum(is.na(.x))))))
colnames(h_train1_summary1) <- c("Missing Observations")
h_train1_summary1 %>% arrange(rownames(h_train1_summary1)) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Missing Observations | |
|---|---|
| 1stFlrSF | 0 |
| 2ndFlrSF | 0 |
| 3SsnPorch | 0 |
| Alley | 1369 |
| Alley_ | 0 |
| BedroomAbvGr | 0 |
| BldgType | 0 |
| BsmtCond | 37 |
| BsmtCond_ | 0 |
| BsmtDummy_ | 0 |
| BsmtExposure | 38 |
| BsmtExposure_ | 1 |
| BsmtFinSF1 | 0 |
| BsmtFinSF2 | 0 |
| BsmtFinType1 | 37 |
| BsmtFinType1_ | 0 |
| BsmtFinType2 | 38 |
| BsmtFinType2_ | 1 |
| BsmtFullBath | 0 |
| BsmtHalfBath | 0 |
| BsmtQual | 37 |
| BsmtQual_ | 0 |
| BsmtUnfSF | 0 |
| CentralAir | 0 |
| CentralAir_ | 0 |
| Condition1 | 0 |
| Condition2 | 0 |
| Electrical | 1 |
| EnclosedPorch | 0 |
| ExterCond | 0 |
| ExterCond_ | 0 |
| Exterior1st | 0 |
| Exterior2nd | 0 |
| ExterQual | 0 |
| ExterQual_ | 0 |
| Fence | 1179 |
| Fence_ | 0 |
| FenceDummy_ | 0 |
| FirePlaceDummy_ | 0 |
| FireplaceQu | 690 |
| FireplaceQu_ | 0 |
| Fireplaces | 0 |
| Foundation | 0 |
| FullBath | 0 |
| Functional | 0 |
| GarageArea | 0 |
| GarageCars | 0 |
| GarageCond | 81 |
| GarageCond_ | 0 |
| GarageDummy_ | 0 |
| GarageFinish | 81 |
| GarageFinish_ | 0 |
| GarageQual | 81 |
| GarageQual_ | 0 |
| GarageType | 81 |
| GarageType_ | 0 |
| GarageYrBlt | 81 |
| GrLivArea | 0 |
| HalfBath | 0 |
| Heating | 0 |
| HeatingQC | 0 |
| HeatingQC_ | 0 |
| HouseStyle | 0 |
| Id | 0 |
| KitchenAbvGr | 0 |
| KitchenQual | 0 |
| KitchenQual_ | 0 |
| LandContour | 0 |
| LandSlope | 0 |
| LotArea | 0 |
| LotConfig | 0 |
| LotFrontage | 259 |
| LotShape | 0 |
| LowQualFinSF | 0 |
| MasVnrArea | 8 |
| MasVnrDummy_ | 8 |
| MasVnrType | 8 |
| MasVnrType_ | 8 |
| MiscFeature | 1406 |
| MiscFeature_ | 0 |
| MiscFeatureDummy_ | 0 |
| MiscVal | 0 |
| MoSold | 0 |
| MSSubClass | 0 |
| MSSubClass_ | 0 |
| MSZoning | 0 |
| Neighborhood | 0 |
| OpenPorchSF | 0 |
| OverallCond | 0 |
| OverallCond_ | 0 |
| OverallQual | 0 |
| OverallQual_ | 0 |
| PavedDrive | 0 |
| PoolArea | 0 |
| PoolDummy_ | 0 |
| PoolQC | 1453 |
| PoolQC_ | 0 |
| RoofMatl | 0 |
| RoofStyle | 0 |
| SaleCondition | 0 |
| SalePrice | 0 |
| SaleType | 0 |
| ScreenPorch | 0 |
| Street | 0 |
| table | 0 |
| TotalBsmtSF | 0 |
| TotRmsAbvGrd | 0 |
| Utilities | 0 |
| WoodDeckSF | 0 |
| YearBuilt | 0 |
| YearRemodAdd | 0 |
| YrSold | 0 |
Resumo dos Dados
Variável Dependente
Sabemos que nossa variável dependente SalePrice é
quantitativa contínua, então podemos fazeruma análise inicial
dela.
# Histograma
h_train1_chart_resp_1 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = SalePrice)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20) +
ggtitle("Histograma do Preço de Venda da casa") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("Preço Venda") +
ylab("Frequência")
#ggplotly(h_train1_chart_resp_1)
# Histograma e Densidade
h_train1_chart_resp_2 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = SalePrice)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20,
aes(y = (..count..) / sum(..count..))) +
geom_density(col = 2, size = 1, aes(y = 20 * (..count..) / sum(..count..))) +
ggtitle("Histograma e curva de densidade do Preço de Venda da casa") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("Preço Venda") +
ylab("Frequência relativa")
#ggplotly(h_train1_chart_resp_2)
# QQPlot
h_train1_chart_resp_3 <- ggplot(ds2_all[ds2_all$table == "train",], aes(sample = SalePrice)) +
stat_qq_point(size = 2, color = "lightblue", pch=19) +
stat_qq_line(color="red", size=0.5) +
ggtitle("Normal Q-Q Plot do Preço de Venda da casa") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("Theoretical") + ylab("Sample")
#ggplotly(h_train1_chart_resp_3)
grid.arrange(h_train1_chart_resp_1,
h_train1_chart_resp_2,
h_train1_chart_resp_3,
nrow = 2,
ncol = 2)Visualmente, temos que essa variável é assimétrica à direita (positiva).
# verificando o nivel de assimetria
skewness(ds2_all[ds2_all$table == "train",]$SalePrice)## [1] 1.879009
O cálculo feito acima comprova que de fato a variável é assimétrica à direita.
# verificando a normalidade da variável
shapiro.test(ds2_all[ds2_all$table == "train",]$SalePrice)##
## Shapiro-Wilk normality test
##
## data: ds2_all[ds2_all$table == "train", ]$SalePrice
## W = 0.86967, p-value < 0.00000000000000022
Acima fizemos o Teste Shapiro-Wilk para Normalidade dessa variável. As hipóteses do teste são:
\[ \begin{cases} H_0: \text{ A amostra veio de uma população normalmente distribuída} \\ H_1: \text{ A amostra não veio de uma população normalmente distribuída} \end{cases} \]
e como p-valor < 5% o teste rejeita \(H_0\), indicando que a amostra de preços de
venda não veio de uma população normal. Isso nos causar problemas mais a
frente quando formos definir o tipo de modelagem que venhamos a fazer,
já que para a regressão linear, um dos pressupostos é que as variáveis
sejam normais.
Um jeito de resolver esse problema seria usar o log desse valor.
ds2_all$SalePrice_log <- log(ds2_all$SalePrice)# Histograma
h_train1_chart_resp_4 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = SalePrice_log)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20) +
ggtitle("Histograma do log Preço de Venda da casa") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("log Preço Venda") +
ylab("Frequência")
#ggplotly(h_train1_chart_resp_4)
# Histograma e Densidade
h_train1_chart_resp_5 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = SalePrice_log)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20,
aes(y = (..count..) / sum(..count..))) +
geom_density(col = 2, size = 1, aes(y = 20 * (..count..) / sum(..count..))) +
ggtitle("Histograma e curva de densidade do log Preço de Venda da casa") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("log Preço Venda") +
ylab("Frequência relativa")
#ggplotly(h_train1_chart_resp_5)
# QQPlot
h_train1_chart_resp_6 <- ggplot(ds2_all[ds2_all$table == "train",], aes(sample = SalePrice_log)) +
stat_qq_point(size = 2, color = "lightblue", pch=19) +
stat_qq_line(color="red", size=0.5) +
ggtitle("Normal Q-Q Plot do log Preço de Venda da casa") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("Theoretical") + ylab("Sample")
#ggplotly(h_train1_chart_resp_6)
grid.arrange(h_train1_chart_resp_4,
h_train1_chart_resp_5,
h_train1_chart_resp_6,
nrow = 2,
ncol = 2)# verificando o nivel de assimetria
skewness(ds2_all[ds2_all$table == "train",]$SalePrice_log)## [1] 0.1210859
# verificando a normalidade da variável
shapiro.test(ds2_all[ds2_all$table == "train",]$SalePrice_log)##
## Shapiro-Wilk normality test
##
## data: ds2_all[ds2_all$table == "train", ]$SalePrice_log
## W = 0.99121, p-value = 0.0000001149
O teste de Shapiro-Wilk continua rejeitando o log da variável. Mais a
frente vamos verificar novamente após retirarmos outliers, pois podem
ser esses valores extremos mexendo com a normalidade da variável
resposta. Foi possivel notar que o log do preço tem um p-valor maior do
que somente o preço, indicando que usar o log aqui é realmente o caminho
certo.
Abaixo temos algumas das estatísticas descritivas das variáveis
numéricas do banco de dados. Não necessariamente são variáveis
quantitavivas. O objetivo aqui é ver se há valores descrepantes dos
demais.
Train
h_train1_summary2 <- ds2_all[ds2_all$table == "train",] %>%
summarise(across(
.cols = where(is.numeric),
.fns = list(
~ sum(is.na(.x)),
min = min,
median = median,
mean = mean,
max = max,
sd = sd
),
na.rm = TRUE,
.names = "{.col}_{.fn}")
) %>%
pivot_longer(cols = everything()) %>%
mutate(statistic = str_match(name, pattern = ".+_(.+)")[,2],
name = str_match(name, pattern = "(.+)_.+")[,2]) %>%
pivot_wider(names_from = name, values_from = value)
h_train1_summary2 %>% kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| statistic | Id | MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SalePrice | BsmtDummy_ | MasVnrDummy_ | FirePlaceDummy_ | GarageDummy_ | PoolDummy_ | FenceDummy_ | MiscFeatureDummy_ | SalePrice_log |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.00 | 0.00000 | 259.00000 | 0.000 | 0.000000 | 0.000000 | 0.0000 | 0.00000 | 8.0000 | 0.0000 | 0.00000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.00000 | 0.0000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.000000 | 0.0000000 | 0.000000 | 0.0000000 | 81.00000 | 0.000000 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.000000 | 0.00000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.0 | 0.0000000 | 8.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
| min | 1.00 | 20.00000 | 21.00000 | 1300.000 | 1.000000 | 1.000000 | 1872.0000 | 1950.00000 | 0.0000 | 0.0000 | 0.00000 | 0.0000 | 0.0000 | 334.0000 | 0.0000 | 0.00000 | 334.0000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.000000 | 0.0000000 | 2.000000 | 0.0000000 | 1900.00000 | 0.000000 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.000000 | 0.00000 | 0.000000 | 0.00000 | 1.000000 | 2006.000000 | 34900.0 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 10.4602421 |
| median | 730.50 | 50.00000 | 69.00000 | 9478.500 | 6.000000 | 5.000000 | 1973.0000 | 1994.00000 | 0.0000 | 383.5000 | 0.00000 | 477.5000 | 991.5000 | 1087.0000 | 0.0000 | 0.00000 | 1464.0000 | 0.0000000 | 0.0000000 | 2.0000000 | 0.0000000 | 3.000000 | 1.0000000 | 6.000000 | 1.0000000 | 1980.00000 | 2.000000 | 480.0000 | 0.00000 | 25.00000 | 0.00000 | 0.000000 | 0.00000 | 0.000000 | 0.00000 | 6.000000 | 2008.000000 | 163000.0 | 1.0000000 | 0.0000000 | 1.0000000 | 1.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 12.0015055 |
| mean | 730.50 | 56.89726 | 70.04996 | 10516.828 | 6.099315 | 5.575342 | 1971.2678 | 1984.86575 | 103.6853 | 443.6397 | 46.54932 | 567.2404 | 1057.4295 | 1162.6267 | 346.9925 | 5.84452 | 1515.4637 | 0.4253425 | 0.0575342 | 1.5650685 | 0.3828767 | 2.866438 | 1.0465753 | 6.517808 | 0.6130137 | 1978.50616 | 1.767123 | 472.9801 | 94.24452 | 46.66027 | 21.95411 | 3.409589 | 15.06096 | 2.758904 | 43.48904 | 6.321918 | 2007.815753 | 180921.2 | 0.9746575 | 0.4084022 | 0.5273973 | 0.9445205 | 0.0047945 | 0.1924658 | 0.0369863 | 12.0240509 |
| max | 1460.00 | 190.00000 | 313.00000 | 215245.000 | 10.000000 | 9.000000 | 2010.0000 | 2010.00000 | 1600.0000 | 5644.0000 | 1474.00000 | 2336.0000 | 6110.0000 | 4692.0000 | 2065.0000 | 572.00000 | 5642.0000 | 3.0000000 | 2.0000000 | 3.0000000 | 2.0000000 | 8.000000 | 3.0000000 | 14.000000 | 3.0000000 | 2010.00000 | 4.000000 | 1418.0000 | 857.00000 | 547.00000 | 552.00000 | 508.000000 | 480.00000 | 738.000000 | 15500.00000 | 12.000000 | 2010.000000 | 755000.0 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 13.5344730 |
| sd | 421.61 | 42.30057 | 24.28475 | 9981.265 | 1.382996 | 1.112799 | 30.2029 | 20.64541 | 181.0662 | 456.0981 | 161.31927 | 441.8670 | 438.7053 | 386.5877 | 436.5284 | 48.62308 | 525.4804 | 0.5189106 | 0.2387526 | 0.5509158 | 0.5028854 | 0.815778 | 0.2203382 | 1.625393 | 0.6446664 | 24.68972 | 0.747315 | 213.8048 | 125.33879 | 66.25603 | 61.11915 | 29.317331 | 55.75742 | 40.177307 | 496.12302 | 2.703626 | 1.328095 | 79442.5 | 0.1572169 | 0.4917076 | 0.4994199 | 0.2289921 | 0.0691000 | 0.3943719 | 0.1887928 | 0.3994519 |
Obs: Na tabela acima a
linha statistic = 1 representa a quantidade de
valores faltantes no banco de treino.
Note que de todas as variáveis numéricas do banco,
- MasVnrArea: contém 8 valores faltantes dos 1460 valores totais da base
de treino, representando 0.5 %.
- LotFrontage: contém 259 valores faltantes dos 1460 valores totais da
base de treino, representando 17.7 %.
Test
h_test1_summary2 <- ds2_all[ds2_all$table == "test",] %>%
summarise(across(
.cols = where(is.numeric),
.fns = list(
~ sum(is.na(.x)),
min = min,
median = median,
mean = mean,
max = max,
sd = sd
),
na.rm = TRUE,
.names = "{.col}_{.fn}")
) %>%
pivot_longer(cols = everything()) %>%
mutate(statistic = str_match(name, pattern = ".+_(.+)")[,2],
name = str_match(name, pattern = "(.+)_.+")[,2]) %>%
pivot_wider(names_from = name, values_from = value)
h_test1_summary2 %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| statistic | Id | MSSubClass | LotFrontage | LotArea | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SalePrice | BsmtDummy_ | MasVnrDummy_ | FirePlaceDummy_ | GarageDummy_ | PoolDummy_ | FenceDummy_ | MiscFeatureDummy_ | SalePrice_log |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0000 | 0.00000 | 227.00000 | 0.000 | 0.000000 | 0.000000 | 0.00000 | 0.00000 | 15.0000 | 1.0000 | 1.00000 | 1.0000 | 1.0000 | 0.0000 | 0.0000 | 0.000000 | 0.0000 | 2.0000000 | 2.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.000000 | 0.0000000 | 78.00000 | 1.0000000 | 1.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.000000 | 0.00000 | 0.000000 | 0.00000 | 0.0 | 1.0000000 | 15.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
| min | 1461.0000 | 20.00000 | 21.00000 | 1470.000 | 1.000000 | 1.000000 | 1879.00000 | 1950.00000 | 0.0000 | 0.0000 | 0.00000 | 0.0000 | 0.0000 | 407.0000 | 0.0000 | 0.000000 | 407.0000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 3.000000 | 0.0000000 | 1895.00000 | 0.0000000 | 0.0000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.000000 | 0.00000 | 1.000000 | 2006.00000 | 135751.3 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 11.8185800 |
| median | 2190.0000 | 50.00000 | 67.00000 | 9399.000 | 6.000000 | 5.000000 | 1973.00000 | 1992.00000 | 0.0000 | 350.5000 | 0.00000 | 460.0000 | 988.0000 | 1079.0000 | 0.0000 | 0.000000 | 1432.0000 | 0.0000000 | 0.0000000 | 2.0000000 | 0.0000000 | 3.0000000 | 1.0000000 | 6.000000 | 0.0000000 | 1979.00000 | 2.0000000 | 480.0000 | 0.00000 | 28.00000 | 0.00000 | 0.00000 | 0.00000 | 0.000000 | 0.00000 | 6.000000 | 2008.00000 | 179208.7 | 1.0000000 | 0.0000000 | 0.0000000 | 1.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 12.0963061 |
| mean | 2190.0000 | 57.37834 | 68.58036 | 9819.161 | 6.078821 | 5.553804 | 1971.35778 | 1983.66278 | 100.7091 | 439.2037 | 52.61934 | 554.2949 | 1046.1180 | 1156.5346 | 325.9678 | 3.543523 | 1486.0459 | 0.4344544 | 0.0652025 | 1.5709390 | 0.3776559 | 2.8540096 | 1.0424949 | 6.385195 | 0.5812200 | 1977.72122 | 1.7661180 | 472.7689 | 93.17478 | 48.31391 | 24.24332 | 1.79438 | 17.06443 | 1.744345 | 58.16792 | 6.104181 | 2007.76971 | 179183.9 | 0.9718793 | 0.3933518 | 0.4996573 | 0.9479095 | 0.0041124 | 0.1987663 | 0.0349554 | 12.0920649 |
| max | 2919.0000 | 190.00000 | 200.00000 | 56600.000 | 10.000000 | 9.000000 | 2010.00000 | 2010.00000 | 1290.0000 | 4010.0000 | 1526.00000 | 2140.0000 | 5095.0000 | 5095.0000 | 1862.0000 | 1064.000000 | 5095.0000 | 3.0000000 | 2.0000000 | 4.0000000 | 2.0000000 | 6.0000000 | 2.0000000 | 15.000000 | 4.0000000 | 2207.00000 | 5.0000000 | 1488.0000 | 1424.00000 | 742.00000 | 1012.00000 | 360.00000 | 576.00000 | 800.000000 | 17000.00000 | 12.000000 | 2010.00000 | 281644.0 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 1.0000000 | 12.5483991 |
| sd | 421.3213 | 42.74688 | 22.37684 | 4955.517 | 1.436812 | 1.113740 | 30.39007 | 21.13047 | 177.6259 | 455.2680 | 176.75393 | 437.2605 | 442.8986 | 398.1658 | 420.6102 | 44.043251 | 485.5661 | 0.5306475 | 0.2524683 | 0.5551899 | 0.5030167 | 0.8297884 | 0.2084717 | 1.508895 | 0.6474205 | 26.43117 | 0.7759451 | 217.0486 | 127.74488 | 68.88336 | 67.22777 | 20.20784 | 56.60976 | 30.491646 | 630.80698 | 2.722432 | 1.30174 | 16518.3 | 0.1653744 | 0.4886630 | 0.5001713 | 0.2222857 | 0.0640180 | 0.3992086 | 0.1837300 | 0.0900529 |
Obs: Na tabela acima a
linha statistic = 1 representa a quantidade de
valores faltantes no banco de test.
Note que de todas as variáveis numéricas do banco,
- MasVnrArea: contém 15 valores faltantes dos 1459 valores totais da
base de treino, representando 1 %.
- LotFrontage: contém 227 valores faltantes dos 1459 valores totais da
base de treino, representando 15.6 %.
- BsmtFinSF1: contém 1 valores faltantes dos 1459 valores totais da base
de treino, representando 0.1 %.
- BsmtFinSF2: contém 1 valores faltantes dos 1459 valores totais da base
de treino, representando 0.1 %.
- BsmtUnfSF: contém 1 valores faltantes dos 1459 valores totais da base
de treino, representando 0.1 %.
- TotalBsmtSF: contém 1 valores faltantes dos 1459 valores totais da
base de treino, representando 0.1 %.
- BsmtFullBath: contém 2 valores faltantes dos 1459 valores totais da
base de treino, representando 0.1 %.
- BsmtHalfBath: contém 2 valores faltantes dos 1459 valores totais da
base de treino, representando 0.1 %.
- GarageYrBlt: contém 78 valores faltantes dos 1459 valores totais da
base de treino, representando 5.3 %.
- GarageCars: contém 1 valores faltantes dos 1459 valores totais da base
de treino, representando 0.1 %.
- GarageArea: contém 1 valores faltantes dos 1459 valores totais da base
de treino, representando 0.1 %.
- BsmtDummy_: contém 1 valores faltantes dos 1459 valores totais da base
de treino, representando 0.1 %.
- MasVnrDummy_: contém 15 valores faltantes dos 1459 valores totais da
base de treino, representando 1 %.
Note que a base de teste tem valores faltantes em mais variáveis
diferentes do que a base de treino, mas ainda sim, a que mais chama
atenção é LotFrontage.
Como a variável LotFrontage tem muitos valores
faltantes, a simplesmente remoção não seria uma boa alternativa. Podemos
pensar em fazer uma regressão simples nessa variável, se encontrarmos
uma que seja correlacionada com ela e ajustar esses valores
faltantes.
Regressão para a variável LotFrontage
# Histograma
h_train1_chart_LF_1 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = LotFrontage)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20) +
ggtitle("Histograma da variável LotFrontage") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("LotFrontage (ft²)") +
ylab("Frequência")
#ggplotly(h_train1_chart_LF_1)
# Histograma e Densidade
h_train1_chart_LF_2 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = LotFrontage)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20,
aes(y = (..count..) / sum(..count..))) +
geom_density(col = 2, size = 1, aes(y = 20 * (..count..) / sum(..count..))) +
ggtitle("Histograma e curva de densidade da variável LotFrontage") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("LotFrontage (ft²)") +
ylab("Frequência relativa")
#ggplotly(h_train1_chart_LF_2)
# QQPlot
h_train1_chart_LF_3 <- ggplot(ds2_all[ds2_all$table == "train",], aes(sample = LotFrontage)) +
stat_qq_point(size = 2, color = "lightblue", pch=19) +
stat_qq_line(color="red", size=0.5) +
ggtitle("Normal Q-Q Plot da variável LotFrontage") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("Theoretical") + ylab("Sample")
#ggplotly(h_train1_chart_LF_3)
#qqnorm(ds2_all[ds2_all$table == "train",]$LotFrontage)
grid.arrange(h_train1_chart_LF_1,
h_train1_chart_LF_2,
h_train1_chart_LF_3,
nrow = 2,
ncol = 2)De maneira similar, temos que LotFrontage é também
assimétrica à direita (positiva), e como vimos, o logaritmo pode
resolver esse problema.
ds2_all$LotFrontage_log <- log(ds2_all$LotFrontage)
ds2_all$LotArea_log <- log(ds2_all$LotArea)# Histograma
h_train1_chart_LF_4 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = LotFrontage_log)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20) +
ggtitle("Histograma da variável LotFrontage_log") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("log(LotFrontage (ft²))") +
ylab("Frequência")
#ggplotly(h_train1_chart_LF_4)
# Histograma e Densidade
h_train1_chart_LF_5 <- ggplot(ds2_all[ds2_all$table == "train",] , aes(x = LotFrontage_log)) +
geom_histogram(color = "black",
fill = "lightblue",
bins = 20,
aes(y = (..count..) / sum(..count..))) +
geom_density(col = 2, size = 1, aes(y = 20 * (..count..) / sum(..count..))) +
ggtitle("Histograma e curva de densidade da variável LotFrontage_log") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("log(LotFrontage (ft²))") +
ylab("Frequência relativa")
#ggplotly(h_train1_chart_LF_5)
# QQPlot
h_train1_chart_LF_6 <- ggplot(ds2_all[ds2_all$table == "train",], aes(sample = LotFrontage_log)) +
stat_qq_point(size = 2, color = "lightblue", pch=19) +
stat_qq_line(color="red", size=0.5) +
ggtitle("Normal Q-Q Plot da variável LotFrontage_log") +
theme(plot.title = element_text(size = 7, face = "bold")) +
xlab("Theoretical") + ylab("Sample")
#ggplotly(h_train1_chart_LF_6)
grid.arrange(h_train1_chart_LF_4,
h_train1_chart_LF_5,
h_train1_chart_LF_6,
nrow = 2,
ncol = 2)Visualmente, após a apligação da função logarítmica, temos que a
distribuição da variável LotFrontage_log se assemelha
mais à Normal.
Relação entre as variáveis
#pairs(ds1_train1_final)
ggpairs(ds2_all[, -c(13, 25, 106)],
ggplot2::aes(colour = table, alpha = 0.5),
upper = list(continuous = wrap("cor", size=2, binwidth=0.5)),
lower=list(combo=wrap("facethist", binwidth=0.8))
) + theme_bw()# num_cols <- unlist(lapply(ds2_all, is.numeric)) # Identify numeric columns (R BASE)
# h_train_cormat_1 <- ds2_all[ , num_cols]
h_train_cormat_1 <- select_if(ds2_all, is.numeric) # Subset numeric columns (dplyr)
h_train_cormat_2 <- round( x = cor(na.omit(h_train_cormat_1)), digits=2)
#head(h_train_cormat_2)
h_train_cormat_3 <- melt(h_train_cormat_2) # heatmap (reshape2)
#head(h_train_cormat_3)
h_train_cormat_4 <- ggplot(data = h_train_cormat_3,
aes(x=Var1, y=Var2, fill = value)) +
geom_tile() +
theme(axis.text.x=element_text(color = "black", size=7, angle=30, vjust=.8, hjust=0.8),
axis.text.y=element_text(color = "black", size=7, angle=0, vjust=.8, hjust=0.8)
) +
ggtitle("Correlation Heatmap")
ggplotly(h_train_cormat_4)Separando somente os que tem correlação acima de 0.5 ou abaixo -0.5.
h_train_cormat_5 <- ggplot(data = h_train_cormat_3[h_train_cormat_3$value > 0.5 |
h_train_cormat_3$value < -0.5,],
aes(x=Var1, y=Var2, fill = value)) +
geom_tile() +
theme(axis.text.x=element_text(color = "black", size=7, angle=30, vjust=.8, hjust=0.8),
axis.text.y=element_text(color = "black", size=7, angle=0, vjust=.8, hjust=0.8)
) +
ggtitle("Correlation Heatmap")
ggplotly(h_train_cormat_5)Abaixo temos a correlação em ordem decrescente de
LotFrontage e as demais variáveis numéricas. Note que
LotArea é a que mais se correlaciona com ela, com uma
correlação de 0.49.
h_train_cormat_3a <- h_train_cormat_3[h_train_cormat_3$Var1 == "LotFrontage" ,]
h_train_cormat_3a[order(h_train_cormat_3a$value, decreasing = TRUE),] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Var1 | Var2 | value | |
|---|---|---|---|
| 99 | LotFrontage | LotFrontage | 1.00 |
| 2211 | LotFrontage | LotFrontage_log | 0.94 |
| 2259 | LotFrontage | LotArea_log | 0.69 |
| 147 | LotFrontage | LotArea | 0.49 |
| 627 | LotFrontage | 1stFlrSF | 0.45 |
| 771 | LotFrontage | GrLivArea | 0.38 |
| 1299 | LotFrontage | GarageArea | 0.38 |
| 579 | LotFrontage | TotalBsmtSF | 0.35 |
| 1107 | LotFrontage | TotRmsAbvGrd | 0.35 |
| 2163 | LotFrontage | SalePrice_log | 0.33 |
| 1251 | LotFrontage | GarageCars | 0.32 |
| 1779 | LotFrontage | SalePrice | 0.31 |
| 1155 | LotFrontage | Fireplaces | 0.25 |
| 1011 | LotFrontage | BedroomAbvGr | 0.24 |
| 387 | LotFrontage | MasVnrArea | 0.22 |
| 435 | LotFrontage | BsmtFinSF1 | 0.22 |
| 1923 | LotFrontage | FirePlaceDummy_ | 0.22 |
| 195 | LotFrontage | OverallQual | 0.21 |
| 1587 | LotFrontage | PoolArea | 0.18 |
| 2019 | LotFrontage | PoolDummy_ | 0.18 |
| 915 | LotFrontage | FullBath | 0.17 |
| 1395 | LotFrontage | OpenPorchSF | 0.17 |
| 1875 | LotFrontage | MasVnrDummy_ | 0.17 |
| 819 | LotFrontage | BsmtFullBath | 0.12 |
| 1347 | LotFrontage | WoodDeckSF | 0.12 |
| 291 | LotFrontage | YearBuilt | 0.11 |
| 531 | LotFrontage | BsmtUnfSF | 0.11 |
| 339 | LotFrontage | YearRemodAdd | 0.09 |
| 1203 | LotFrontage | GarageYrBlt | 0.08 |
| 1539 | LotFrontage | ScreenPorch | 0.07 |
| 1635 | LotFrontage | MiscVal | 0.05 |
| 483 | LotFrontage | BsmtFinSF2 | 0.04 |
| 963 | LotFrontage | HalfBath | 0.04 |
| 675 | LotFrontage | 2ndFlrSF | 0.03 |
| 1491 | LotFrontage | 3SsnPorch | 0.03 |
| 1443 | LotFrontage | EnclosedPorch | 0.02 |
| 2115 | LotFrontage | MiscFeatureDummy_ | 0.02 |
| 1683 | LotFrontage | MoSold | 0.01 |
| 1059 | LotFrontage | KitchenAbvGr | 0.00 |
| 2067 | LotFrontage | FenceDummy_ | 0.00 |
| 723 | LotFrontage | LowQualFinSF | -0.01 |
| 1731 | LotFrontage | YrSold | -0.01 |
| 867 | LotFrontage | BsmtHalfBath | -0.02 |
| 3 | LotFrontage | Id | -0.03 |
| 1827 | LotFrontage | BsmtDummy_ | -0.03 |
| 243 | LotFrontage | OverallCond | -0.07 |
| 51 | LotFrontage | MSSubClass | -0.42 |
| 1971 | LotFrontage | GarageDummy_ | NA |
Abaixo temos a correlação em ordem decrescente de
LotFrontage_log e as demais variáveis numéricas. Note
que LotArea_log é a que mais se correlaciona com ela,
com uma correlação de 0.77.
h_train_cormat_3b <- h_train_cormat_3[h_train_cormat_3$Var1 == "LotFrontage_log" ,]
h_train_cormat_3b[order(h_train_cormat_3b$value, decreasing = TRUE),] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Var1 | Var2 | value | |
|---|---|---|---|
| 2255 | LotFrontage_log | LotFrontage_log | 1.00 |
| 143 | LotFrontage_log | LotFrontage | 0.94 |
| 2303 | LotFrontage_log | LotArea_log | 0.77 |
| 191 | LotFrontage_log | LotArea | 0.45 |
| 671 | LotFrontage_log | 1stFlrSF | 0.42 |
| 1151 | LotFrontage_log | TotRmsAbvGrd | 0.35 |
| 1343 | LotFrontage_log | GarageArea | 0.35 |
| 815 | LotFrontage_log | GrLivArea | 0.34 |
| 623 | LotFrontage_log | TotalBsmtSF | 0.33 |
| 2207 | LotFrontage_log | SalePrice_log | 0.32 |
| 1823 | LotFrontage_log | SalePrice | 0.30 |
| 1295 | LotFrontage_log | GarageCars | 0.29 |
| 1055 | LotFrontage_log | BedroomAbvGr | 0.27 |
| 1199 | LotFrontage_log | Fireplaces | 0.23 |
| 1967 | LotFrontage_log | FirePlaceDummy_ | 0.21 |
| 479 | LotFrontage_log | BsmtFinSF1 | 0.18 |
| 239 | LotFrontage_log | OverallQual | 0.17 |
| 431 | LotFrontage_log | MasVnrArea | 0.16 |
| 959 | LotFrontage_log | FullBath | 0.15 |
| 1439 | LotFrontage_log | OpenPorchSF | 0.15 |
| 575 | LotFrontage_log | BsmtUnfSF | 0.13 |
| 1919 | LotFrontage_log | MasVnrDummy_ | 0.13 |
| 1391 | LotFrontage_log | WoodDeckSF | 0.11 |
| 1631 | LotFrontage_log | PoolArea | 0.11 |
| 2063 | LotFrontage_log | PoolDummy_ | 0.11 |
| 863 | LotFrontage_log | BsmtFullBath | 0.09 |
| 1583 | LotFrontage_log | ScreenPorch | 0.08 |
| 335 | LotFrontage_log | YearBuilt | 0.07 |
| 383 | LotFrontage_log | YearRemodAdd | 0.06 |
| 1247 | LotFrontage_log | GarageYrBlt | 0.05 |
| 1679 | LotFrontage_log | MiscVal | 0.05 |
| 527 | LotFrontage_log | BsmtFinSF2 | 0.04 |
| 2159 | LotFrontage_log | MiscFeatureDummy_ | 0.04 |
| 1103 | LotFrontage_log | KitchenAbvGr | 0.03 |
| 1487 | LotFrontage_log | EnclosedPorch | 0.03 |
| 1535 | LotFrontage_log | 3SsnPorch | 0.03 |
| 2111 | LotFrontage_log | FenceDummy_ | 0.03 |
| 1727 | LotFrontage_log | MoSold | 0.01 |
| 719 | LotFrontage_log | 2ndFlrSF | 0.00 |
| 767 | LotFrontage_log | LowQualFinSF | -0.01 |
| 1007 | LotFrontage_log | HalfBath | -0.01 |
| 1775 | LotFrontage_log | YrSold | -0.01 |
| 911 | LotFrontage_log | BsmtHalfBath | -0.02 |
| 47 | LotFrontage_log | Id | -0.03 |
| 287 | LotFrontage_log | OverallCond | -0.04 |
| 1871 | LotFrontage_log | BsmtDummy_ | -0.04 |
| 95 | LotFrontage_log | MSSubClass | -0.55 |
| 2015 | LotFrontage_log | GarageDummy_ | NA |
Note que a variável LotArea é a que tem correlação mais alta. Vale ressaltar que também observamos os maiores valores negativos e não somente os positivos.
#plot(ds2_all$LotArea, ds2_all$LotFrontage)
#hist(ds2_all$LotArea)
h_train1_chart_LF_7 = ggplot(data = ds2_all[ds2_all$table == "train" & !is.na(ds2_all$LotArea_log),],
aes(x = LotArea_log, y = LotFrontage_log))+
geom_point(color="lightblue")+
geom_smooth(method = "lm", se = TRUE, color='black')+
ggtitle("Gráfico de Dispersão") +
xlab("LotAre_log") + ylab("LotForntage_log")
ggplotly(h_train1_chart_LF_7)A partir do gráfico acima, pode-se dizer que uma regressão linear
simples entre as duas variáveis é possível. Vamos então verificar o
ajuste desse modelo, ajustar os valores e inserí-los na base principal
quando esse valor for faltante.
Para essa regressão, vou considerar a base toda (tanto train quanto
test) e fazer uma subdivisão de treino e teste aleatória, para verificar
se o modelo é de fato bem preditivo.
ds2_all_LF <- ds2_all[, c("Id", "LotFrontage_log", "LotArea_log")]
ds2_all_LF_rm <- na.omit(ds2_all_LF)
set.seed(123)
index_LF <- createDataPartition(ds2_all_LF_rm$LotFrontage_log,
p = 0.7,
list = FALSE)
ds2_LF_train <- ds2_all_LF_rm[index_LF, ]
ds2_LF_test <- ds2_all_LF_rm[-index_LF, ]Modelo 1: com intercepto
h_LF_model1 <- lm( LotFrontage_log ~ LotArea_log ,
data = ds2_LF_train)
summary(h_LF_model1)##
## Call:
## lm(formula = LotFrontage_log ~ LotArea_log, data = ds2_LF_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3118 -0.1255 0.0280 0.1439 1.2517
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.71076 0.10743 -6.616 0.0000000000492 ***
## LotArea_log 0.53940 0.01183 45.607 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2413 on 1704 degrees of freedom
## Multiple R-squared: 0.5497, Adjusted R-squared: 0.5494
## F-statistic: 2080 on 1 and 1704 DF, p-value: < 0.00000000000000022
#par(mfrow=c(2,2))
#plot(h_LF_model1)
autoplot(h_LF_model1)Modelo 2: sem intercepto
h_LF_model2 <- lm( LotFrontage_log ~ LotArea_log - 1 ,
data = ds2_LF_train)
summary(h_LF_model2)##
## Call:
## lm(formula = LotFrontage_log ~ LotArea_log - 1, data = ds2_LF_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20400 -0.12192 0.02706 0.14598 1.19128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## LotArea_log 0.4612642 0.0006511 708.5 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2443 on 1705 degrees of freedom
## Multiple R-squared: 0.9966, Adjusted R-squared: 0.9966
## F-statistic: 5.019e+05 on 1 and 1705 DF, p-value: < 0.00000000000000022
#par(mfrow=c(2,2))
#plot(h_LF_model2)
autoplot(h_LF_model2)O Modelo 2 (sem intercepto) apresenta um R² de 99.7%.
anova(h_LF_model1, h_LF_model2)## Analysis of Variance Table
##
## Model 1: LotFrontage_log ~ LotArea_log
## Model 2: LotFrontage_log ~ LotArea_log - 1
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1704 99.18
## 2 1705 101.73 -1 -2.5479 43.775 0.00000000004919 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analisando a ANOVA para os dois modelos a diferença da Soma Quadrada
dos Resíduos (RSS) não aumentou muito quando retiramos o intercepto, mas
o ajuste do modelo (R²) melhorou significativamente, de 54.9 % para 99.7
%.
Vamos agora fazer o cross validation dos dois modelos usando
resampling.
# Specify resampling plan
cv <- trainControl(
method = "repeatedcv",
number = 10,
repeats = 5
)
set.seed(123) # for reproducibility
cv_h_LF_model1 <- train(
form = LotFrontage_log ~ LotArea_log,
data = ds2_LF_train,
method = "lm",
trControl = cv,
preProcess = c("center", "scale"),
tuneGrid = expand.grid(intercept = TRUE)
)
summary(cv_h_LF_model1)##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3118 -0.1255 0.0280 0.1439 1.2517
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.181395 0.005841 715.87 <0.0000000000000002 ***
## LotArea_log 0.266468 0.005843 45.61 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2413 on 1704 degrees of freedom
## Multiple R-squared: 0.5497, Adjusted R-squared: 0.5494
## F-statistic: 2080 on 1 and 1704 DF, p-value: < 0.00000000000000022
set.seed(123) # for reproducibility
cv_h_LF_model2 <- train(
form = LotFrontage_log ~ -1 + LotArea_log ,
data = ds2_LF_train,
method = "lm",
trControl = cv,
preProcess = c("center", "scale"),
tuneGrid = expand.grid(intercept = FALSE)
)
summary(cv_h_LF_model2)##
## Call:
## lm(formula = .outcome ~ 0 + ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## 2.870 4.056 4.209 4.325 5.433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## LotArea_log 0.2665 0.1015 2.626 0.00871 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.19 on 1705 degrees of freedom
## Multiple R-squared: 0.004029, Adjusted R-squared: 0.003445
## F-statistic: 6.897 on 1 and 1705 DF, p-value: 0.00871
Usando esse método de resampling, vemos que no final das contas,
ambos modelos me dão erros e ajustes iguais, isso porque o cross
validation nesse caso não consegue fazer a destinção de um modelo com ou
sem intercepto. Mesmo usando a opção
html tuneGrid = expand.grid(intercept = FALSE) mas isso só
remove o incercepto depois do cálculo da regressão (com o intercepto) já
ter sido feito, o que não é o que queremos.
#### Evaluation
# Extract out of sample performance measures
summary(resamples(list(
model1 = cv_h_LF_model1,
model2 = cv_h_LF_model2
)))##
## Call:
## summary.resamples(object = resamples(list(model1 = cv_h_LF_model1, model2
## = cv_h_LF_model2)))
##
## Models: model1, model2
## Number of resamples: 50
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## model1 0.1458912 0.169162 0.1768206 0.1761007 0.1827132 0.2171508 0
## model2 4.1477406 4.171669 4.1803189 4.1813083 4.1941816 4.2140968 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## model1 0.1967674 0.2267414 0.2397586 0.2406741 0.2546231 0.3233613 0
## model2 4.1556224 4.1776655 4.1877780 4.1882604 4.2000731 4.2193947 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## model1 0.3374527 0.4809316 0.5679558 0.5527062 0.5943837 0.7124251 0
## model2 0.3374527 0.4809316 0.5679558 0.5527062 0.5943837 0.7124251 0
Dessa maneira, vamos fazer o teste de modo simple, usando a base de teste e calculando os erros.
summary(h_LF_model1)##
## Call:
## lm(formula = LotFrontage_log ~ LotArea_log, data = ds2_LF_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3118 -0.1255 0.0280 0.1439 1.2517
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.71076 0.10743 -6.616 0.0000000000492 ***
## LotArea_log 0.53940 0.01183 45.607 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2413 on 1704 degrees of freedom
## Multiple R-squared: 0.5497, Adjusted R-squared: 0.5494
## F-statistic: 2080 on 1 and 1704 DF, p-value: < 0.00000000000000022
summary(h_LF_model2)##
## Call:
## lm(formula = LotFrontage_log ~ LotArea_log - 1, data = ds2_LF_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20400 -0.12192 0.02706 0.14598 1.19128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## LotArea_log 0.4612642 0.0006511 708.5 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2443 on 1705 degrees of freedom
## Multiple R-squared: 0.9966, Adjusted R-squared: 0.9966
## F-statistic: 5.019e+05 on 1 and 1705 DF, p-value: < 0.00000000000000022
ds2_LF_train$pred_m1 <- h_LF_model1$fitted.values
ds2_LF_train$pred_m2 <- h_LF_model2$fitted.values
ds2_LF_test$pred_m1 <- h_LF_model1 %>% predict(ds2_LF_test)
ds2_LF_test$pred_m2 <- h_LF_model2 %>% predict(ds2_LF_test)table1_train_m1 <- data.frame(R2 = R2(ds2_LF_train$pred_m1, ds2_LF_train$LotFrontage_log),
RMSE = RMSE(ds2_LF_train$pred_m1, ds2_LF_train$LotFrontage_log),
MAE = MAE(ds2_LF_train$pred_m1, ds2_LF_train$LotFrontage_log)
)
table1_train_m2 <- data.frame(R2 = R2(ds2_LF_train$pred_m2, ds2_LF_train$LotFrontage_log),
RMSE = RMSE(ds2_LF_train$pred_m2, ds2_LF_train$LotFrontage_log),
MAE = MAE(ds2_LF_train$pred_m2, ds2_LF_train$LotFrontage_log)
)
table1_test_m1 <- data.frame(R2 = R2(ds2_LF_test$pred_m1, ds2_LF_test$LotFrontage_log),
RMSE = RMSE(ds2_LF_test$pred_m1, ds2_LF_test$LotFrontage_log),
MAE = MAE(ds2_LF_test$pred_m1, ds2_LF_test$LotFrontage_log)
)
table1_test_m2 <- data.frame(R2 = R2(ds2_LF_test$pred_m2, ds2_LF_test$LotFrontage_log),
RMSE = RMSE(ds2_LF_test$pred_m2, ds2_LF_test$LotFrontage_log),
MAE = MAE(ds2_LF_test$pred_m2, ds2_LF_test$LotFrontage_log)
)
table1 <- rbind(table1_train_m1, table1_train_m2, table1_test_m1, table1_test_m2)
row.names(table1) <- c("train_model_1", "train_model_2", "test_model_1", "test_model_2")
table1 %>% kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| R2 | RMSE | MAE | |
|---|---|---|---|
| train_model_1 | 0.5496813 | 0.2411142 | 0.1758202 |
| train_model_2 | 0.5496813 | 0.2441916 | 0.1805571 |
| test_model_1 | 0.6800449 | 0.2011005 | 0.1538040 |
| test_model_2 | 0.6800449 | 0.2086177 | 0.1598457 |
Como podemos ver as estatisticas que comparam os modelos, não faz
muita diferença entre se ter ou não o intercepto, e como é mais lógico
neste caso, vou usar o modelo sem intercepto como o modelo final.
summary(h_LF_model2)##
## Call:
## lm(formula = LotFrontage_log ~ LotArea_log - 1, data = ds2_LF_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20400 -0.12192 0.02706 0.14598 1.19128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## LotArea_log 0.4612642 0.0006511 708.5 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2443 on 1705 degrees of freedom
## Multiple R-squared: 0.9966, Adjusted R-squared: 0.9966
## F-statistic: 5.019e+05 on 1 and 1705 DF, p-value: < 0.00000000000000022
exp(h_LF_model2$coefficients)## LotArea_log
## 1.586078
E explicação lógica desse modelo é a distância entre a casa e a rua
possui uma relação linear com o tamanho total do terreno. Pelo modelo,
temos que a cada uma unidade de acrescimo na variável
LrArea_log, a variável LotFrontage_log
aumenta 0.4612642 . Se voltarmos ambas as variáveis para o normal
(aplicandoa função exponencial), temos que cada uma unidade de acrescimo
na variável LrArea_log, a variável
LotFrontage aumenta 1.5860778 .
Vamos então criar a variável preditiva na base de dados original e
utilizar esses valores para preencher os faltantes.
ds2_all$LotFrontage_log_pred = h_LF_model2 %>% predict(ds2_all)ds2_all$LotFrontage_log_2 <- ifelse(is.na(ds2_all$LotFrontage_log),
ds2_all$LotFrontage_log_pred,
ds2_all$LotFrontage_log)
ds2_all$LotFrontage_2 <- ifelse(is.na(ds2_all$LotFrontage),
exp(ds2_all$LotFrontage_log_pred),
ds2_all$LotFrontage)Vamos verificar como ficou a variável final com relação à
original.
h_train1_chart_LF_8 <- ggplot(ds2_all, aes(x = LotArea,
y = LotFrontage_2,
fill = is.na(LotFrontage))) +
geom_boxplot() +
ggtitle("Gráfico de Box-plot") +
xlab("LotArea (ft²)") +
ylab("LotFrontage_2 (ft²)")
h_train1_chart_LF_8Resumo da variável que acabamos de predizer para a base de
treino:
summary(ds2_all[ds2_all$table == "train" , c("LotFrontage", "LotFrontage_2")])## LotFrontage LotFrontage_2
## Min. : 21.00 Min. : 21.00
## 1st Qu.: 59.00 1st Qu.: 60.00
## Median : 69.00 Median : 70.00
## Mean : 70.05 Mean : 70.81
## 3rd Qu.: 80.00 3rd Qu.: 80.00
## Max. :313.00 Max. :313.00
## NA's :259
Resumo da variável que acabamos de predizer para a base de
teste:
summary(ds2_all[ds2_all$table == "test" , c("LotFrontage", "LotFrontage_2")])## LotFrontage LotFrontage_2
## Min. : 21.00 Min. : 21.00
## 1st Qu.: 58.00 1st Qu.: 60.00
## Median : 67.00 Median : 68.00
## Mean : 68.58 Mean : 69.18
## 3rd Qu.: 80.00 3rd Qu.: 80.00
## Max. :200.00 Max. :200.00
## NA's :227
A variável ajustada parece estar um pouco mais inflada que a
original, mas nada muito descrepante. Os valores mínimo e máximo
continuam os mesmos em ambos subsets.
Concluímos então essa etapa e vamos olhar agora para o modelo
principal.
Variável Ano de Construção da Garagem (GarageYrBlt)
Notamos que temos muitos valores faltantes nessa variável.
ds2_all_GYB_summary1 = ds2_all %>%
dplyr::select(Id, GarageYrBlt, GarageDummy_, table) %>%
group_by(table, GarageDummy_) %>%
#filter(raca=="Branca") %>%
summarise(
n = n(),
min = min(GarageYrBlt, na.rm = T),
max = max(GarageYrBlt, na.rm = T),
mean = mean(GarageYrBlt, na.rm = T),
median = median(GarageYrBlt, na.rm = T),
sd = sd(GarageYrBlt, na.rm = T),
missing = sum(is.na(GarageYrBlt))
)
ds2_all_GYB_summary1 %>% kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| table | GarageDummy_ | n | min | max | mean | median | sd | missing |
|---|---|---|---|---|---|---|---|---|
| test | 0 | 76 | Inf | -Inf | NaN | NA | NA | 76 |
| test | 1 | 1383 | 1895 | 2207 | 1977.721 | 1979 | 26.43117 | 2 |
| train | 0 | 81 | Inf | -Inf | NaN | NA | NA | 81 |
| train | 1 | 1379 | 1900 | 2010 | 1978.506 | 1980 | 24.68972 | 0 |
Pela tablea acima é possível notar que todos os valores faltantes são
de fato para propriedades que não possuem garagem, o que estaria
correto.
Uma coisa que chama bastante atenção é o ano máximo da base de teste,
mostrando o que possivelmente foi erro de digitação. Assim, vamos setar
o ano de 2022, pois é quando estamos fazendo esse trabalho e qualquer
ano acima disso vamos colocar como nulo.
ds2_all$GarageYrBlt_ <- ifelse(ds2_all$GarageYrBlt > 2022, NA, ds2_all$GarageYrBlt) ds2_all_GYB_summary2 = ds2_all %>%
dplyr::select(Id, GarageYrBlt_, GarageDummy_, table) %>%
group_by(table, GarageDummy_) %>%
#filter(raca=="Branca") %>%
summarise(
n = n(),
min = min(GarageYrBlt_, na.rm = T),
max = max(GarageYrBlt_, na.rm = T),
mean = mean(GarageYrBlt_, na.rm = T),
median = median(GarageYrBlt_, na.rm = T),
sd = sd(GarageYrBlt_, na.rm = T),
missing = sum(is.na(GarageYrBlt_))
)
ds2_all_GYB_summary2 %>% kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| table | GarageDummy_ | n | min | max | mean | median | sd | missing |
|---|---|---|---|---|---|---|---|---|
| test | 0 | 76 | Inf | -Inf | NaN | NA | NA | 76 |
| test | 1 | 1383 | 1895 | 2010 | 1977.555 | 1979 | 25.70924 | 3 |
| train | 0 | 81 | Inf | -Inf | NaN | NA | NA | 81 |
| train | 1 | 1379 | 1900 | 2010 | 1978.506 | 1980 | 24.68972 | 0 |
Base de Dados Ajustada
Aqui vamos remover as colunas que necessitavam de ajustes e deixar
somente as ajustadas.
# removing old variables
ds2_all_semi = ds2_all %>% dplyr::select(
Id,
table,
SalePrice,
SalePrice_log,
`1stFlrSF`,
`2ndFlrSF`,
`3SsnPorch`,
#Alley,
Alley_,
BedroomAbvGr,
BldgType,
BsmtDummy_,
#BsmtCond,
BsmtCond_,
#BsmtExposure,
BsmtExposure_,
BsmtFinSF1,
BsmtFinSF2,
#BsmtFinType1,
BsmtFinType1_,
#BsmtFinType2,
BsmtFinType2_,
BsmtFullBath,
BsmtHalfBath,
#BsmtQual,
BsmtQual_,
BsmtUnfSF,
#CentralAir,
CentralAir_,
Condition1,
Condition2,
Electrical,
EnclosedPorch,
#ExterCond,
ExterCond_,
Exterior1st,
Exterior2nd,
#ExterQual,
ExterQual_,
FenceDummy_,
#Fence,
Fence_,
FirePlaceDummy_,
#FireplaceQu,
FireplaceQu_,
Fireplaces,
Foundation,
FullBath,
Functional,
GarageArea,
GarageCars,
GarageDummy_,
#GarageCond,
GarageCond_,
#GarageFinish,
GarageFinish_,
#GarageQual,
GarageQual_,
#GarageType,
GarageType_,
#GarageYrBlt,
GarageYrBlt_,
GrLivArea,
HalfBath,
Heating,
#HeatingQC,
HeatingQC_,
HouseStyle,
KitchenAbvGr,
#KitchenQual,
KitchenQual_,
LandContour,
LandSlope,
LotArea,
#LotArea_log,
LotConfig,
#LotFrontage,
LotFrontage_2,
#LotFrontage_log,
#LotFrontage_log_2,
#LotFrontage_log_pred,
LotShape,
LowQualFinSF,
MasVnrDummy_,
MasVnrArea,
#MasVnrType,
MasVnrType_,
MiscFeatureDummy_,
#MiscFeature,
MiscFeature_,
MiscVal,
MoSold,
#MSSubClass,
MSSubClass_,
MSZoning,
Neighborhood,
OpenPorchSF,
#OverallCond,
OverallCond_,
#OverallQual,
OverallQual_,
PavedDrive,
PoolArea,
PoolDummy_,
#PoolQC,
PoolQC_,
RoofMatl,
RoofStyle,
SaleCondition,
SaleType,
ScreenPorch,
Street,
TotalBsmtSF,
TotRmsAbvGrd,
Utilities,
WoodDeckSF,
YearBuilt,
YearRemodAdd,
YrSold
)Valores faltantes
Verificar a quantidade de valores faltantes na base final.
Train
h_train2_summary1 <- as.data.frame(t(ds2_all_semi[ds2_all_semi$table == "train",] %>%
summarise(across(everything(), ~ sum(is.na(.x))))))
colnames(h_train2_summary1) <- c("Missing Observations")
h_train2_summary1$`-` <- ""
h_train2_summary1[h_train2_summary1$`Missing Observations` > 0, ] %>%
arrange(desc(`Missing Observations`)) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Missing Observations |
|
|
|---|---|---|
| GarageYrBlt_ | 81 | |
| MasVnrDummy_ | 8 | |
| MasVnrArea | 8 | |
| MasVnrType_ | 8 | |
| BsmtExposure_ | 1 | |
| BsmtFinType2_ | 1 | |
| Electrical | 1 |
Test
h_test2_summary1 <- as.data.frame(t(ds2_all_semi[ds2_all_semi$table == "test",] %>%
summarise(across(everything(), ~ sum(is.na(.x))))))
colnames(h_test2_summary1) <- c("Missing Observations")
h_test2_summary1$`-` <- ""
h_test2_summary1[h_test2_summary1$`Missing Observations` > 0, ] %>%
arrange(desc(`Missing Observations`)) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Missing Observations |
|
|
|---|---|---|
| GarageYrBlt_ | 79 | |
| MasVnrType_ | 16 | |
| MasVnrDummy_ | 15 | |
| MasVnrArea | 15 | |
| BsmtCond_ | 4 | |
| MSZoning | 4 | |
| BsmtExposure_ | 3 | |
| BsmtQual_ | 3 | |
| PoolQC_ | 3 | |
| BsmtFullBath | 2 | |
| BsmtHalfBath | 2 | |
| Functional | 2 | |
| GarageCond_ | 2 | |
| GarageFinish_ | 2 | |
| GarageQual_ | 2 | |
| Utilities | 2 | |
| BsmtDummy_ | 1 | |
| BsmtFinSF1 | 1 | |
| BsmtFinSF2 | 1 | |
| BsmtFinType1_ | 1 | |
| BsmtFinType2_ | 1 | |
| BsmtUnfSF | 1 | |
| Exterior1st | 1 | |
| Exterior2nd | 1 | |
| GarageArea | 1 | |
| GarageCars | 1 | |
| KitchenQual_ | 1 | |
| SaleType | 1 | |
| TotalBsmtSF | 1 |
Vamos remover os valores faltantes de forma condicional:
ds2_all_semi = ds2_all_semi %>% mutate(
remove_rows = case_when(
GarageDummy_ == 1 & is.na(GarageYrBlt_) == TRUE ~ 1 ,
GarageDummy_ == 1 & is.na(GarageArea) == TRUE ~ 1 ,
GarageDummy_ == 1 & is.na(GarageCars) == TRUE ~ 1 ,
GarageDummy_ == 1 & is.na(GarageCond_) == TRUE ~ 1 ,
GarageDummy_ == 1 & is.na(GarageFinish_) == TRUE ~ 1 ,
GarageDummy_ == 1 & is.na(GarageQual_) == TRUE ~ 1 ,
PoolDummy_ == 1 & is.na(PoolQC_) == TRUE ~ 1 ,
is.na(BsmtDummy_) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtCond_) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtQual_) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtExposure_) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtFinType1_) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtFinType2_) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtFullBath) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtHalfBath) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtFinSF1) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtFinSF2) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(BsmtUnfSF) == TRUE ~ 1 ,
BsmtDummy_ == 1 & is.na(TotalBsmtSF) == TRUE ~ 1 ,
is.na(Exterior1st) == TRUE ~ 1 ,
is.na(Functional) == TRUE ~ 1 ,
KitchenAbvGr > 0 & is.na(KitchenQual_) == TRUE ~ 1 ,
is.na(MasVnrDummy_) == TRUE ~ 1 ,
MasVnrDummy_ == 1 & is.na(MasVnrType_) == TRUE ~ 1 ,
MasVnrDummy_ == 1 & is.na(MasVnrArea) == TRUE ~ 1 ,
is.na(MSZoning) == TRUE ~ 1 ,
is.na(SaleType) == TRUE ~ 1 ,
is.na(Utilities) == TRUE ~ 1 ,
is.na(Electrical) == TRUE ~ 1 ,
TRUE ~ 0
)
)h_both2_summary1 <- as.data.frame(t(ds2_all_semi[ds2_all_semi$remove_rows == 0,] %>%
summarise(across(everything(), ~ sum(is.na(.x))))))
colnames(h_both2_summary1) <- c("Missing Observations")
h_both2_summary1$`-` <- ""
h_both2_summary1[h_both2_summary1$`Missing Observations` > 0, ] %>%
arrange(desc(`Missing Observations`)) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Missing Observations |
|
|
|---|---|---|
| GarageYrBlt_ | 155 | |
| BsmtFullBath | 1 | |
| BsmtHalfBath | 1 |
Note que agora restaram somente os valores que são de fato valores
reais. Para todos os casos, vamos deixar como está no momento.
ds2_all_semi[ds2_all_semi$remove_rows == 0 & is.na(ds2_all_semi$BsmtQual_),] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)
ds2_all_semi[ds2_all_semi$remove_rows == 0 & ds2_all_semi$BsmtFinType1_ == "Unf",] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)
ds2_all_semi[ds2_all_semi$remove_rows == 0 & is.na(ds2_all_semi$Electrical),] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)
ds2_all_semi[ds2_all_semi$remove_rows == 0 & is.na(ds2_all_semi$BsmtHalfBath),] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)
ds2_all_semi[is.na(ds2_all_semi$BsmtHalfBath),] %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)Relação entre as variáveis
ggpairs(ds2_all_semi[ds2_all_semi$table == "train", c(3:4,5:20)],
#ggplot2::aes(colour = Transported_, alpha = 0.5),
upper = list(continuous = wrap("cor", size=2, binwidth=0.5))
) + theme_bw()ds2_all_semi_Bsmt = ds2_all_semi[ds2_all_semi$table == "train",] %>%
dplyr::select("SalePrice" | contains("Bsm"))
ggpairs(ds2_all_semi_Bsmt,
ggplot2::aes(colour = as.factor(BsmtDummy_), alpha = 0.5)
) + theme_bw()ggpairs(ds2_all_semi[ds2_all_semi$table == "train", c(3:4,11:12)],
ggplot2::aes(colour = as.factor(BsmtDummy_), alpha = 0.5)
) + theme_bw()Pleo gráfico ascima podemos ver que os preços das casas são mais
elevados quando a casa possui um porão. A correlação entre elas é fraca
0.2 .
ggpairs(ds2_all_semi[ds2_all_semi$table == "train", c(3:4,11,56,84)],
ggplot2::aes(colour = as.factor(BsmtDummy_), alpha = 0.5)
) + theme_bw()A correlação entre A área total do porão e o log do preço de venda é
0.61 . Nem mesmo a área do lote está tão fortemente correlacionada com o
preço. Aparentemente, área construída agrega mais valor na venda.
Utilizando o mapa de calor que fizemos acima,
h_train1_cormat_1 <- select_if(ds2_all_semi[ds2_all_semi$table == "train", ], is.numeric) # Subset numeric columns (dplyr)
h_train1_cormat_2 <- round( x = cor(na.omit(h_train_cormat_1)), digits=2)
#head(h_train1_cormat_2)
h_train1_cormat_3 <- melt(h_train_cormat_2) # heatmap (reshape2)
#head(h_train1_cormat_3)
h_train1_cormat_4 <- ggplot(data = h_train_cormat_3,
aes(x=Var1, y=Var2, fill = value)) +
geom_tile() +
theme(axis.text.x=element_text(color = "black", size=7, angle=30, vjust=.8, hjust=0.8),
axis.text.y=element_text(color = "black", size=7, angle=0, vjust=.8, hjust=0.8)
) +
ggtitle("Correlation Heatmap")
ggplotly(h_train1_cormat_4)h_train1_cormat_5 <- ggplot(data = h_train1_cormat_3[h_train1_cormat_3$value > 0.5 |
h_train1_cormat_3$value < -0.5,],
aes(x=Var1, y=Var2, fill = value)) +
geom_tile() +
theme(axis.text.x=element_text(color = "black", size=7, angle=30, vjust=.8, hjust=0.8),
axis.text.y=element_text(color = "black", size=7, angle=0, vjust=.8, hjust=0.8)
) +
ggtitle("Correlation Heatmap")
ggplotly(h_train1_cormat_5)h_train1_cormat_4$data[h_train1_cormat_4$data$Var1 == "SalePrice_log" ,] %>%
arrange(desc(value)) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Var1 | Var2 | value |
|---|---|---|
| SalePrice_log | SalePrice_log | 1.00 |
| SalePrice_log | SalePrice | 0.95 |
| SalePrice_log | GrLivArea | 0.57 |
| SalePrice_log | OverallQual | 0.56 |
| SalePrice_log | GarageCars | 0.49 |
| SalePrice_log | TotRmsAbvGrd | 0.47 |
| SalePrice_log | GarageArea | 0.47 |
| SalePrice_log | FullBath | 0.46 |
| SalePrice_log | TotalBsmtSF | 0.45 |
| SalePrice_log | 1stFlrSF | 0.45 |
| SalePrice_log | YearBuilt | 0.41 |
| SalePrice_log | YearRemodAdd | 0.41 |
| SalePrice_log | GarageYrBlt | 0.38 |
| SalePrice_log | LotArea_log | 0.37 |
| SalePrice_log | FirePlaceDummy_ | 0.36 |
| SalePrice_log | Fireplaces | 0.35 |
| SalePrice_log | MasVnrArea | 0.34 |
| SalePrice_log | LotFrontage | 0.33 |
| SalePrice_log | LotFrontage_log | 0.32 |
| SalePrice_log | LotArea | 0.31 |
| SalePrice_log | MasVnrDummy_ | 0.29 |
| SalePrice_log | OpenPorchSF | 0.27 |
| SalePrice_log | BsmtFinSF1 | 0.26 |
| SalePrice_log | 2ndFlrSF | 0.26 |
| SalePrice_log | BedroomAbvGr | 0.26 |
| SalePrice_log | WoodDeckSF | 0.26 |
| SalePrice_log | HalfBath | 0.24 |
| SalePrice_log | BsmtUnfSF | 0.18 |
| SalePrice_log | BsmtFullBath | 0.16 |
| SalePrice_log | BsmtDummy_ | 0.10 |
| SalePrice_log | ScreenPorch | 0.09 |
| SalePrice_log | MoSold | 0.08 |
| SalePrice_log | Id | 0.06 |
| SalePrice_log | PoolArea | 0.06 |
| SalePrice_log | PoolDummy_ | 0.06 |
| SalePrice_log | 3SsnPorch | 0.02 |
| SalePrice_log | MiscVal | 0.01 |
| SalePrice_log | LowQualFinSF | 0.00 |
| SalePrice_log | BsmtFinSF2 | -0.01 |
| SalePrice_log | BsmtHalfBath | -0.02 |
| SalePrice_log | YrSold | -0.04 |
| SalePrice_log | MiscFeatureDummy_ | -0.06 |
| SalePrice_log | OverallCond | -0.07 |
| SalePrice_log | MSSubClass | -0.10 |
| SalePrice_log | KitchenAbvGr | -0.10 |
| SalePrice_log | EnclosedPorch | -0.10 |
| SalePrice_log | FenceDummy_ | -0.14 |
| SalePrice_log | GarageDummy_ | NA |
Analisando o gráfico acima, temos que:
- GrLivArea: cor = 0.57 )
- OverallQual: cor = 0.56 )
sendo essas as variáveis que possuem mais correlação com o preço de venda da casas.
ggpairs(ds2_all_semi[ds2_all_semi$table == "train", c(3:4,11,56,84, 76, 73, 47, 80, 88:90)],
ggplot2::aes(colour = SaleCondition, alpha = 0.5),
upper = list(continuous = wrap("cor", size=2, binwidth=0.5))
) + theme_bw()Conseguimor ver que a variável Área total do porão parece ser linearmente relacionada com o log to preço de venda da casa.
Gráficos Gerais
SalePrice_log & OverallQual_
plot(x=ds2_all_semi[ds2_all_semi$table == "train", ]$OverallQual_,
y=ds2_all_semi[ds2_all_semi$table == "train", ]$SalePrice_log,
xlab = "OverallQual_", ylab = "SalePrice_log")SalePrice_log & GrLivArea
plot(x=ds2_all_semi[ds2_all_semi$table == "train", ]$GrLivArea,
y=ds2_all_semi[ds2_all_semi$table == "train", ]$SalePrice_log,
xlab = "GrLivArea", ylab = "SalePrice_log")SalePrice_log & GarageCars
plot(x=ds2_all_semi[ds2_all_semi$table == "train", ]$GarageCars,
y=ds2_all_semi[ds2_all_semi$table == "train", ]$SalePrice_log,
xlab = "GarageCars", ylab = "SalePrice_log")SalePrice_log & TotalBsmtSF
plot(x=ds2_all_semi[ds2_all_semi$table == "train", ]$TotalBsmtSF,
y=ds2_all_semi[ds2_all_semi$table == "train", ]$SalePrice_log,
xlab = "TotalBsmtSF", ylab = "SalePrice_log")SalePrice_log & 1stFlrSF
plot(x=ds2_all_semi[ds2_all_semi$table == "train", ]$`1stFlrSF`,
y=ds2_all_semi[ds2_all_semi$table == "train", ]$SalePrice_log,
xlab = "1stFlrSF", ylab = "SalePrice_log")SalePrice_log & YearBuilt
boxplot(ds2_all_semi[ds2_all_semi$table == "train", ]$SalePrice_log ~
ds2_all_semi[ds2_all_semi$table == "train", ]$YearBuilt,
xlab = "YearBuilt", ylab = "SalePrice_log")Modelagem [Modelling]
A variável resposta é do tipo quantitativa contínua, pois ão os
valores em que as casas foram vendidas. Assim, podemos usar Regressão Linear, e como já foi mostrado
acima, vamos usar o log do preço de venda ao invés da variável
original.
ds2_all_train <- ds2_all_semi[ds2_all_semi$table == "train" & ds2_all_semi$remove_rows == 0, -c(2,91,46)]
ds2_all_test <- ds2_all_semi[ds2_all_semi$table == "test" & ds2_all_semi$remove_rows == 0, -c(2,91,46)]Modelo 1
Modelo com todas as variáveis.
##Model 1
houses_model1 <- lm(SalePrice_log ~ . ,
data = ds2_all_train[,-c(1,2)])
summary(houses_model1)##
## Call:
## lm(formula = SalePrice_log ~ ., data = ds2_all_train[, -c(1,
## 2)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68950 -0.04404 0.00093 0.04902 0.68823
##
## Coefficients: (18 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.445272939 6.664463206 1.417 0.156673
## `1stFlrSF` 0.000087859 0.000090567 0.970 0.332197
## `2ndFlrSF` 0.000094861 0.000084729 1.120 0.263125
## `3SsnPorch` 0.000151058 0.000101480 1.489 0.136875
## Alley_Grvl 0.008003877 0.019423966 0.412 0.680369
## Alley_Pave 0.032578630 0.022542689 1.445 0.148670
## BedroomAbvGr 0.005403449 0.006463330 0.836 0.403316
## BldgType2fmCon -0.110650935 0.126969824 -0.871 0.383674
## BldgTypeDuplex -0.025038660 0.034407879 -0.728 0.466942
## BldgTypeTwnhs -0.010116974 0.070666494 -0.143 0.886184
## BldgTypeTwnhsE 0.013549673 0.067210114 0.202 0.840263
## BsmtDummy_ -3.040371047 6.062429219 -0.502 0.616106
## BsmtCond_.L 3.075055785 4.300318121 0.715 0.474704
## BsmtCond_.Q -1.980372359 2.826051449 -0.701 0.483594
## BsmtCond_.C 0.646667919 0.954842115 0.677 0.498380
## BsmtCond_^4 NA NA NA NA
## BsmtExposure_.L 0.056815304 0.048244347 1.178 0.239172
## BsmtExposure_.Q -0.006540911 0.043595850 -0.150 0.880763
## BsmtExposure_.C 0.021178930 0.026540327 0.798 0.425037
## BsmtExposure_^4 NA NA NA NA
## BsmtFinSF1 0.000147958 0.000024713 5.987 0.00000000283628147 ***
## BsmtFinSF2 0.000139854 0.000041414 3.377 0.000757 ***
## BsmtFinType1_.L -0.146692672 0.187435394 -0.783 0.434002
## BsmtFinType1_.Q 0.174672510 0.182923561 0.955 0.339828
## BsmtFinType1_.C -0.131003342 0.137533876 -0.953 0.341031
## BsmtFinType1_^4 0.077718831 0.080984435 0.960 0.337416
## BsmtFinType1_^5 -0.019730971 0.034979843 -0.564 0.572816
## BsmtFinType1_^6 NA NA NA NA
## BsmtFinType2_.L -0.524605497 0.300899772 -1.743 0.081516 .
## BsmtFinType2_.Q 0.540580802 0.292328232 1.849 0.064677 .
## BsmtFinType2_.C -0.373627820 0.221675314 -1.685 0.092163 .
## BsmtFinType2_^4 0.203491799 0.128369319 1.585 0.113189
## BsmtFinType2_^5 -0.140003210 0.066684052 -2.100 0.035986 *
## BsmtFinType2_^6 NA NA NA NA
## BsmtFullBath 0.025123416 0.009039814 2.779 0.005536 **
## BsmtHalfBath 0.008603663 0.013825206 0.622 0.533854
## BsmtQual_.L -0.040120177 0.069962710 -0.573 0.566449
## BsmtQual_.Q 0.036478582 0.049150205 0.742 0.458123
## BsmtQual_.C 0.002161504 0.023622498 0.092 0.927109
## BsmtQual_^4 NA NA NA NA
## BsmtUnfSF 0.000092249 0.000022559 4.089 0.00004622374626898 ***
## CentralAir_Yes 0.054437351 0.017783930 3.061 0.002256 **
## Condition1Feedr 0.032967524 0.022945831 1.437 0.151054
## Condition1Norm 0.086777602 0.019123272 4.538 0.00000626887814370 ***
## Condition1PosA 0.045699297 0.045591705 1.002 0.316377
## Condition1PosN 0.093634887 0.034030407 2.752 0.006024 **
## Condition1RRAe -0.031907091 0.041362164 -0.771 0.440621
## Condition1RRAn 0.058638267 0.031535897 1.859 0.063219 .
## Condition1RRNe 0.014450171 0.079142976 0.183 0.855157
## Condition1RRNn 0.072838364 0.058505353 1.245 0.213385
## Condition2Feedr 0.193409940 0.123161096 1.570 0.116596
## Condition2Norm 0.169925200 0.111562937 1.523 0.127995
## Condition2PosA 0.361693476 0.181897986 1.988 0.046995 *
## Condition2PosN -0.677951784 0.141949224 -4.776 0.00000201386966487 ***
## Condition2RRAe -0.704800999 0.334725005 -2.106 0.035450 *
## Condition2RRAn 0.077538819 0.157213646 0.493 0.621959
## Condition2RRNn 0.164572750 0.139323833 1.181 0.237753
## ElectricalFuseF -0.004792882 0.027060393 -0.177 0.859446
## ElectricalFuseP -0.102365794 0.085522751 -1.197 0.231571
## ElectricalMix 0.902287332 1.482118925 0.609 0.542787
## ElectricalSBrkr -0.031429039 0.013818263 -2.274 0.023118 *
## EnclosedPorch 0.000115499 0.000057672 2.003 0.045439 *
## ExterCond_.L 0.136190133 0.106661783 1.277 0.201911
## ExterCond_.Q -0.052352105 0.088372904 -0.592 0.553697
## ExterCond_.C 0.054460595 0.055003908 0.990 0.322319
## ExterCond_^4 0.009275298 0.023426937 0.396 0.692233
## Exterior1stAsphShn 0.018716989 0.150125882 0.125 0.900802
## Exterior1stBrkComm -0.266367659 0.132088764 -2.017 0.043967 *
## Exterior1stBrkFace 0.091455184 0.059930173 1.526 0.127272
## Exterior1stCBlock -0.153586692 0.127971982 -1.200 0.230320
## Exterior1stCemntBd -0.071167407 0.089182524 -0.798 0.425034
## Exterior1stHdBoard -0.000652808 0.061136362 -0.011 0.991482
## Exterior1stImStucc 0.020093421 0.129065749 0.156 0.876309
## Exterior1stMetalSd 0.050374341 0.069079363 0.729 0.466010
## Exterior1stPlywood 0.005307285 0.060030636 0.088 0.929566
## Exterior1stStone 0.037803432 0.113015094 0.334 0.738063
## Exterior1stStucco 0.032833624 0.067603922 0.486 0.627287
## Exterior1stVinylSd 0.010507384 0.063113781 0.166 0.867805
## Exterior1stWd Sdng -0.024276280 0.058729686 -0.413 0.679421
## Exterior1stWdShing 0.023096146 0.063000914 0.367 0.713983
## Exterior2ndAsphShn 0.045066083 0.102013774 0.442 0.658741
## Exterior2ndBrk Cmn 0.098241134 0.094430805 1.040 0.298391
## Exterior2ndBrkFace -0.032134623 0.061598307 -0.522 0.601991
## Exterior2ndCBlock NA NA NA NA
## Exterior2ndCmentBd 0.125866513 0.086988217 1.447 0.148182
## Exterior2ndHdBoard 0.013748235 0.058346421 0.236 0.813760
## Exterior2ndImStucc 0.017637653 0.067164300 0.263 0.792901
## Exterior2ndMetalSd -0.008829215 0.066880939 -0.132 0.894996
## Exterior2ndOther -0.085158811 0.123464627 -0.690 0.490493
## Exterior2ndPlywood 0.010148482 0.056634515 0.179 0.857817
## Exterior2ndStone 0.007602025 0.084643774 0.090 0.928452
## Exterior2ndStucco 0.000274382 0.064390612 0.004 0.996601
## Exterior2ndVinylSd 0.024623306 0.060332806 0.408 0.683257
## Exterior2ndWd Sdng 0.043697237 0.056268949 0.777 0.437564
## Exterior2ndWd Shng 0.004687541 0.058696184 0.080 0.936361
## ExterQual_.L -0.044033860 0.040318987 -1.092 0.274997
## ExterQual_.Q 0.035489287 0.029123228 1.219 0.223245
## ExterQual_.C -0.013677312 0.014455511 -0.946 0.344259
## FenceDummy_ -0.022463836 0.033938958 -0.662 0.508172
## Fence_GdPrv 0.016196968 0.037440199 0.433 0.665378
## Fence_GdWo -0.019873353 0.036589366 -0.543 0.587133
## Fence_MnPrv 0.012154413 0.034694346 0.350 0.726155
## Fence_MnWw NA NA NA NA
## FirePlaceDummy_ 0.311834458 0.249548333 1.250 0.211697
## FireplaceQu_.L -0.191958730 0.152138729 -1.262 0.207295
## FireplaceQu_.Q 0.157890400 0.142558346 1.108 0.268284
## FireplaceQu_.C -0.123362767 0.091395109 -1.350 0.177349
## FireplaceQu_^4 0.036996165 0.037463065 0.988 0.323583
## FireplaceQu_^5 NA NA NA NA
## Fireplaces 0.017349622 0.011752131 1.476 0.140133
## FoundationCBlock 0.006393041 0.014815407 0.432 0.666175
## FoundationPConc 0.029196755 0.015853318 1.842 0.065774 .
## FoundationSlab -0.001279006 0.046184922 -0.028 0.977912
## FoundationStone 0.086460472 0.052208032 1.656 0.097974 .
## FoundationWood -0.122249736 0.067336280 -1.816 0.069701 .
## FullBath 0.027593779 0.010177670 2.711 0.006802 **
## FunctionalMaj2 -0.277554753 0.067536424 -4.110 0.00004237132855035 ***
## FunctionalMin1 -0.011446829 0.040655501 -0.282 0.778333
## FunctionalMin2 -0.043187008 0.041485228 -1.041 0.298080
## FunctionalMod -0.107010152 0.049576353 -2.158 0.031092 *
## FunctionalSev -0.282356071 0.135553951 -2.083 0.037469 *
## FunctionalTyp 0.004708831 0.036255061 0.130 0.896683
## GarageArea 0.000117805 0.000035215 3.345 0.000848 ***
## GarageCars 0.017250344 0.010478907 1.646 0.099991 .
## GarageDummy_ 1.171128298 0.593479097 1.973 0.048693 *
## GarageCond_.L -0.397328697 0.287290820 -1.383 0.166922
## GarageCond_.Q 0.028754417 0.300990481 0.096 0.923908
## GarageCond_.C -0.201930863 0.135627848 -1.489 0.136793
## GarageCond_^4 -0.039312517 0.099366410 -0.396 0.692449
## GarageCond_^5 NA NA NA NA
## GarageFinish_Fin 0.009559999 0.011081134 0.863 0.388464
## GarageFinish_RFn 0.014324603 0.009901745 1.447 0.148255
## GarageFinish_Unf NA NA NA NA
## GarageQual_.L -0.270777481 0.414111788 -0.654 0.513320
## GarageQual_.Q 0.644670220 0.366129054 1.761 0.078537 .
## GarageQual_.C -0.184169018 0.174579040 -1.055 0.291674
## GarageQual_^4 0.188069595 0.092881468 2.025 0.043111 *
## GarageQual_^5 NA NA NA NA
## GarageType_2Types -0.145818373 0.051737647 -2.818 0.004907 **
## GarageType_Attchd -0.000763719 0.010753243 -0.071 0.943392
## GarageType_Basment -0.008190734 0.031960141 -0.256 0.797780
## GarageType_BuiltIn -0.010263696 0.018543978 -0.553 0.580041
## GarageType_CarPort 0.015550221 0.046173055 0.337 0.736342
## GarageType_Detchd NA NA NA NA
## GrLivArea 0.000131977 0.000088654 1.489 0.136843
## HalfBath 0.027607277 0.009610500 2.873 0.004144 **
## HeatingGasA 0.144812212 0.118592803 1.221 0.222298
## HeatingGasW 0.199141597 0.122221471 1.629 0.103507
## HeatingGrav -0.067148788 0.129536226 -0.518 0.604292
## HeatingOthW 0.131663009 0.145315838 0.906 0.365097
## HeatingWall 0.167036994 0.136343391 1.225 0.220776
## HeatingQC_.L 0.069012793 0.076857065 0.898 0.369403
## HeatingQC_.Q -0.034893542 0.065082329 -0.536 0.591960
## HeatingQC_.C 0.037422916 0.040668653 0.920 0.357662
## HeatingQC_^4 -0.018772311 0.018265275 -1.028 0.304275
## HouseStyle1.5Unf 0.206833868 0.113685740 1.819 0.069113 .
## HouseStyle1Story -0.015592170 0.040580059 -0.384 0.700876
## HouseStyle2.5Fin -0.031640765 0.082664088 -0.383 0.701965
## HouseStyle2.5Unf 0.081868602 0.077767081 1.053 0.292676
## HouseStyle2Story -0.002066528 0.037567868 -0.055 0.956142
## HouseStyleSFoyer -0.010050362 0.053986994 -0.186 0.852349
## HouseStyleSLvl 0.050061827 0.063194670 0.792 0.428413
## KitchenAbvGr -0.053103714 0.028456272 -1.866 0.062270 .
## KitchenQual_.L 0.030003504 0.020134990 1.490 0.136462
## KitchenQual_.Q 0.033262350 0.014236234 2.336 0.019635 *
## KitchenQual_.C 0.009922890 0.007850449 1.264 0.206485
## LandContourHLS 0.028435438 0.023718108 1.199 0.230812
## LandContourLow 0.004249393 0.029920106 0.142 0.887085
## LandContourLvl 0.030436812 0.017117308 1.778 0.075641 .
## LandSlopeMod 0.035245072 0.018461242 1.909 0.056488 .
## LandSlopeSev -0.207249759 0.052037245 -3.983 0.00007232395635704 ***
## LotArea 0.000002613 0.000000532 4.912 0.00000103128690908 ***
## LotConfigCulDSac 0.033090785 0.014998530 2.206 0.027558 *
## LotConfigFR2 -0.030903749 0.018239898 -1.694 0.090475 .
## LotConfigFR3 -0.076328660 0.057088072 -1.337 0.181471
## LotConfigInside -0.015786028 0.008230917 -1.918 0.055367 .
## LotFrontage_2 0.000416990 0.000203867 2.045 0.041038 *
## LotShapeIR2 0.022767179 0.019182277 1.187 0.235513
## LotShapeIR3 0.026080503 0.040083475 0.651 0.515397
## LotShapeReg 0.010276264 0.007356104 1.397 0.162686
## LowQualFinSF NA NA NA NA
## MasVnrDummy_ 0.008740393 0.054428025 0.161 0.872447
## MasVnrArea 0.000018014 0.000026693 0.675 0.499891
## MasVnrType_BrkCmn -0.050651434 0.061534448 -0.823 0.410596
## MasVnrType_BrkFace 0.003040660 0.053954994 0.056 0.955068
## MasVnrType_Stone 0.010707835 0.055362722 0.193 0.846670
## MiscFeatureDummy_ 0.025162395 0.216947184 0.116 0.907685
## MiscFeature_Gar2 0.475576533 0.452685704 1.051 0.293674
## MiscFeature_Othr -0.048931583 0.233331931 -0.210 0.833932
## MiscFeature_Shed -0.008174406 0.213546133 -0.038 0.969471
## MiscFeature_TenC NA NA NA NA
## MiscVal -0.000028703 0.000028913 -0.993 0.321041
## MoSold -0.000841969 0.001133129 -0.743 0.457602
## MSSubClass_30 -0.066385166 0.021739803 -3.054 0.002312 **
## MSSubClass_40 -0.066457794 0.080257658 -0.828 0.407808
## MSSubClass_45 -0.222128184 0.115279404 -1.927 0.054237 .
## MSSubClass_50 -0.005352633 0.039925361 -0.134 0.893373
## MSSubClass_60 -0.034456391 0.035227130 -0.978 0.328216
## MSSubClass_70 0.012526961 0.038333617 0.327 0.743887
## MSSubClass_75 -0.054579869 0.075152636 -0.726 0.467828
## MSSubClass_80 -0.070101787 0.057809928 -1.213 0.225517
## MSSubClass_85 -0.016866191 0.048623780 -0.347 0.728750
## MSSubClass_90 NA NA NA NA
## MSSubClass_120 -0.048999617 0.066440675 -0.737 0.460969
## MSSubClass_160 -0.152536948 0.079986058 -1.907 0.056759 .
## MSSubClass_180 -0.074324722 0.089688845 -0.829 0.407445
## MSSubClass_190 0.082093056 0.129448085 0.634 0.526089
## MSZoningFV 0.507257436 0.056715436 8.944 < 0.0000000000000002 ***
## MSZoningRH 0.459405375 0.056449639 8.138 0.00000000000000101 ***
## MSZoningRL 0.469954925 0.048748364 9.640 < 0.0000000000000002 ***
## MSZoningRM 0.419635019 0.045840089 9.154 < 0.0000000000000002 ***
## NeighborhoodBlueste 0.042994824 0.091286257 0.471 0.637736
## NeighborhoodBrDale -0.000690402 0.053836997 -0.013 0.989770
## NeighborhoodBrkSide 0.025410867 0.044086222 0.576 0.564462
## NeighborhoodClearCr 0.028648590 0.042712467 0.671 0.502524
## NeighborhoodCollgCr -0.013920515 0.033435039 -0.416 0.677234
## NeighborhoodCrawfor 0.115649202 0.039695386 2.913 0.003643 **
## NeighborhoodEdwards -0.076832370 0.037119160 -2.070 0.038682 *
## NeighborhoodGilbert -0.005989834 0.035376559 -0.169 0.865577
## NeighborhoodIDOTRR -0.007161678 0.049931090 -0.143 0.885974
## NeighborhoodMeadowV -0.141506636 0.056385388 -2.510 0.012220 *
## NeighborhoodMitchel -0.049640680 0.037697474 -1.317 0.188157
## NeighborhoodNAmes -0.036925489 0.036181459 -1.021 0.307672
## NeighborhoodNoRidge 0.037503291 0.038988250 0.962 0.336292
## NeighborhoodNPkVill 0.000191903 0.063798006 0.003 0.997600
## NeighborhoodNridgHt 0.076817346 0.034624316 2.219 0.026705 *
## NeighborhoodNWAmes -0.036169347 0.036953587 -0.979 0.327892
## NeighborhoodOldTown -0.028726008 0.044652342 -0.643 0.520138
## NeighborhoodSawyer -0.026306301 0.037380125 -0.704 0.481728
## NeighborhoodSawyerW 0.001245753 0.036037458 0.035 0.972430
## NeighborhoodSomerst 0.026424328 0.041577607 0.636 0.525199
## NeighborhoodStoneBr 0.130732928 0.038541185 3.392 0.000717 ***
## NeighborhoodSWISU 0.012978591 0.044932308 0.289 0.772749
## NeighborhoodTimber 0.014362237 0.037694190 0.381 0.703257
## NeighborhoodVeenker 0.049618005 0.048097839 1.032 0.302470
## OpenPorchSF 0.000040980 0.000053525 0.766 0.444057
## OverallCond_.L -0.693702489 0.781309444 -0.888 0.374792
## OverallCond_.Q 0.955873073 0.795563817 1.202 0.229798
## OverallCond_.C -0.750575508 0.658791405 -1.139 0.254802
## OverallCond_^4 0.469494467 0.456123815 1.029 0.303545
## OverallCond_^5 -0.205417721 0.262979452 -0.781 0.434891
## OverallCond_^6 0.072118655 0.117923696 0.612 0.540940
## OverallCond_^7 -0.003595957 0.037932501 -0.095 0.924491
## OverallCond_^8 NA NA NA NA
## OverallQual_.L 0.623062772 0.083879178 7.428 0.00000000000021146 ***
## OverallQual_.Q -0.267740145 0.075131373 -3.564 0.000380 ***
## OverallQual_.C 0.216330376 0.062535500 3.459 0.000561 ***
## OverallQual_^4 -0.162365647 0.053538655 -3.033 0.002477 **
## OverallQual_^5 0.042387295 0.046642341 0.909 0.363657
## OverallQual_^6 -0.038000300 0.037702798 -1.008 0.313715
## OverallQual_^7 0.000664531 0.027296199 0.024 0.980581
## OverallQual_^8 0.000274743 0.017189173 0.016 0.987250
## OverallQual_^9 -0.004159637 0.009087225 -0.458 0.647220
## PavedDriveP -0.017682013 0.025572935 -0.691 0.489429
## PavedDriveY 0.011565676 0.016001476 0.723 0.469954
## PoolArea 0.001433329 0.001047376 1.368 0.171419
## PoolDummy_ -1.251618726 0.730576714 -1.713 0.086941 .
## PoolQC_.L 0.417470059 0.313550053 1.331 0.183306
## PoolQC_.Q -0.204436262 0.248829036 -0.822 0.411475
## PoolQC_.C NA NA NA NA
## RoofMatlCompShg 2.554739903 0.244189655 10.462 < 0.0000000000000002 ***
## RoofMatlMembran 2.999456976 0.289234765 10.370 < 0.0000000000000002 ***
## RoofMatlMetal 2.837026139 0.286760810 9.893 < 0.0000000000000002 ***
## RoofMatlRoll 2.555356893 0.271146428 9.424 < 0.0000000000000002 ***
## RoofMatlTar&Grv 2.571263567 0.261107554 9.848 < 0.0000000000000002 ***
## RoofMatlWdShake 2.502067474 0.255537071 9.791 < 0.0000000000000002 ***
## RoofMatlWdShngl 2.616999481 0.248367475 10.537 < 0.0000000000000002 ***
## RoofStyleGable 0.021500243 0.083911505 0.256 0.797822
## RoofStyleGambrel 0.004287777 0.092064495 0.047 0.962861
## RoofStyleHip 0.020691550 0.084162719 0.246 0.805840
## RoofStyleMansard 0.070973087 0.097541404 0.728 0.466991
## RoofStyleShed 0.513396114 0.174309907 2.945 0.003290 **
## SaleConditionAdjLand 0.113591167 0.066655549 1.704 0.088618 .
## SaleConditionAlloca 0.059815217 0.043150968 1.386 0.165954
## SaleConditionFamily 0.017831900 0.027774990 0.642 0.520990
## SaleConditionNormal 0.060991587 0.013253675 4.602 0.00000464269700382 ***
## SaleConditionPartial 0.031257290 0.067681482 0.462 0.644289
## SaleTypeCon 0.090469010 0.079521495 1.138 0.255492
## SaleTypeConLD 0.104755653 0.045770712 2.289 0.022274 *
## SaleTypeConLI -0.025852769 0.052678319 -0.491 0.623683
## SaleTypeConLw 0.017920986 0.056716579 0.316 0.752078
## SaleTypeCWD 0.045030248 0.058767353 0.766 0.443684
## SaleTypeNew 0.057201304 0.070579539 0.810 0.417845
## SaleTypeOth 0.060390376 0.066378410 0.910 0.363120
## SaleTypeWD -0.016541858 0.019174313 -0.863 0.388474
## ScreenPorch 0.000287243 0.000057067 5.033 0.00000055715083132 ***
## StreetPave 0.110068686 0.058747765 1.874 0.061237 .
## TotalBsmtSF NA NA NA NA
## TotRmsAbvGrd 0.001170065 0.004403557 0.266 0.790510
## UtilitiesNoSeWa -0.298669260 0.129475558 -2.307 0.021242 *
## WoodDeckSF 0.000088345 0.000026826 3.293 0.001020 **
## YearBuilt 0.001704179 0.000380058 4.484 0.00000804415592358 ***
## YearRemodAdd 0.000892482 0.000255912 3.487 0.000506 ***
## YrSold -0.002734777 0.002348348 -1.165 0.244436
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.102 on 1173 degrees of freedom
## Multiple R-squared: 0.9472, Adjusted R-squared: 0.9348
## F-statistic: 76.54 on 275 and 1173 DF, p-value: < 0.00000000000000022
Esse modelo nos dá um R² de 93.5 %. O que normalmente pode significar
que está mais ajustado do que o necessário (overfitting).
autoplot(houses_model1)Modelo 2
Vamos aplicar o stepwise.
houses_model1_step <- houses_model1 %>%
stepAIC(direction = "both", trace = FALSE)Abaixo temos a ANOVA do stepwise, mostrando o modelo inicial e o final.
houses_model1_step$anova## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## SalePrice_log ~ `1stFlrSF` + `2ndFlrSF` + `3SsnPorch` + Alley_ +
## BedroomAbvGr + BldgType + BsmtDummy_ + BsmtCond_ + BsmtExposure_ +
## BsmtFinSF1 + BsmtFinSF2 + BsmtFinType1_ + BsmtFinType2_ +
## BsmtFullBath + BsmtHalfBath + BsmtQual_ + BsmtUnfSF + CentralAir_ +
## Condition1 + Condition2 + Electrical + EnclosedPorch + ExterCond_ +
## Exterior1st + Exterior2nd + ExterQual_ + FenceDummy_ + Fence_ +
## FirePlaceDummy_ + FireplaceQu_ + Fireplaces + Foundation +
## FullBath + Functional + GarageArea + GarageCars + GarageDummy_ +
## GarageCond_ + GarageFinish_ + GarageQual_ + GarageType_ +
## GrLivArea + HalfBath + Heating + HeatingQC_ + HouseStyle +
## KitchenAbvGr + KitchenQual_ + LandContour + LandSlope + LotArea +
## LotConfig + LotFrontage_2 + LotShape + LowQualFinSF + MasVnrDummy_ +
## MasVnrArea + MasVnrType_ + MiscFeatureDummy_ + MiscFeature_ +
## MiscVal + MoSold + MSSubClass_ + MSZoning + Neighborhood +
## OpenPorchSF + OverallCond_ + OverallQual_ + PavedDrive +
## PoolArea + PoolDummy_ + PoolQC_ + RoofMatl + RoofStyle +
## SaleCondition + SaleType + ScreenPorch + Street + TotalBsmtSF +
## TotRmsAbvGrd + Utilities + WoodDeckSF + YearBuilt + YearRemodAdd +
## YrSold
##
## Final Model:
## SalePrice_log ~ `3SsnPorch` + BsmtExposure_ + BsmtFinSF1 + BsmtFinSF2 +
## BsmtFullBath + BsmtUnfSF + CentralAir_ + Condition1 + Condition2 +
## Electrical + EnclosedPorch + Exterior1st + Fireplaces + Foundation +
## FullBath + Functional + GarageArea + GarageCars + GarageCond_ +
## GarageQual_ + GarageType_ + GrLivArea + HalfBath + Heating +
## HeatingQC_ + KitchenAbvGr + KitchenQual_ + LandSlope + LotArea +
## LotConfig + LotFrontage_2 + MasVnrType_ + MSSubClass_ + MSZoning +
## Neighborhood + OverallCond_ + OverallQual_ + PoolArea + PoolQC_ +
## RoofMatl + RoofStyle + SaleCondition + SaleType + ScreenPorch +
## Street + Utilities + WoodDeckSF + YearBuilt + YearRemodAdd +
## FenceDummy_ + LowQualFinSF + ExterCond_
##
##
## Step Df Deviance Resid. Df Resid. Dev
## 1 1173 12.20025
## 2 - TotalBsmtSF 0 0.000000000000000000000 1173 12.20025
## 3 - PoolDummy_ 0 0.000000000000010658141 1173 12.20025
## 4 - MiscFeatureDummy_ 0 0.000000000000007105427 1173 12.20025
## 5 - LowQualFinSF 0 0.000000000000000000000 1173 12.20025
## 6 - GarageDummy_ 0 0.000000000000003552714 1173 12.20025
## 7 - FirePlaceDummy_ 0 0.000000000000007105427 1173 12.20025
## 8 - FenceDummy_ 0 0.000000000000000000000 1173 12.20025
## 9 - BsmtDummy_ 0 0.000000000000021316282 1173 12.20025
## 10 - Exterior2nd 14 0.139208557697136470210 1187 12.33946
## 11 - FireplaceQu_ 5 0.035569769176275300993 1192 12.37503
## 12 - MiscFeature_ 4 0.021123364712414272049 1196 12.39616
## 13 - ExterQual_ 3 0.007775983259554308802 1199 12.40393
## 14 - BldgType 3 0.011233859814323565729 1202 12.41517
## 15 - HouseStyle 7 0.094423567560646404218 1209 12.50959
## 16 - BsmtQual_ 3 0.025286597369474250740 1212 12.53488
## 17 - PavedDrive 2 0.014050766404931636089 1214 12.54893
## 18 - LotShape 3 0.032901078447157061646 1217 12.58183
## 19 - MiscVal 1 0.000392805301846266275 1218 12.58222
## 20 - MasVnrDummy_ 1 0.000543791402053628303 1219 12.58276
## 21 - GarageFinish_ 2 0.019805218824139103617 1221 12.60257
## 22 - BsmtFinType1_ 5 0.069808384233837728061 1226 12.67238
## 23 - BsmtFinType2_ 5 0.059448110061602221776 1231 12.73183
## 24 - Alley_ 2 0.017042286764628045148 1233 12.74887
## 25 - TotRmsAbvGrd 1 0.003042152155099842048 1234 12.75191
## 26 - LandContour 3 0.040491804005055698212 1237 12.79240
## 27 - BedroomAbvGr 1 0.005958859604271893318 1238 12.79836
## 28 - MasVnrArea 1 0.006463273712499884027 1239 12.80482
## 29 - BsmtCond_ 2 0.026785073492627375913 1241 12.83161
## 30 - Fence_ 4 0.059589501244804665703 1245 12.89120
## 31 + FenceDummy_ 1 0.028468327195056986056 1244 12.86273
## 32 - YrSold 1 0.008930977115221949703 1245 12.87166
## 33 - MoSold 1 0.015995090802931599683 1246 12.88766
## 34 - OpenPorchSF 1 0.015730955107287059036 1247 12.90339
## 35 - ExterCond_ 4 0.070158192292607779450 1251 12.97355
## 36 - `1stFlrSF` 1 0.016142182511673652812 1252 12.98969
## 37 - `2ndFlrSF` 1 0.009009738751784013289 1253 12.99870
## 38 + LowQualFinSF 1 0.022328129377410022016 1252 12.97637
## 39 - BsmtHalfBath 1 0.016684227979824939325 1253 12.99305
## 40 + ExterCond_ 4 0.071605291085855782285 1249 12.92145
## AIC
## 1 -6370.122
## 2 -6370.122
## 3 -6370.122
## 4 -6370.122
## 5 -6370.122
## 6 -6370.122
## 7 -6370.122
## 8 -6370.122
## 9 -6370.122
## 10 -6381.683
## 11 -6387.512
## 12 -6393.040
## 13 -6398.132
## 14 -6402.820
## 15 -6405.841
## 16 -6408.915
## 17 -6411.292
## 18 -6413.498
## 19 -6415.453
## 20 -6417.390
## 21 -6419.111
## 22 -6421.107
## 23 -6424.325
## 24 -6426.387
## 25 -6428.041
## 26 -6429.448
## 27 -6430.773
## 28 -6432.041
## 29 -6433.013
## 30 -6434.300
## 31 -6435.503
## 32 -6436.498
## 33 -6436.698
## 34 -6436.930
## 35 -6437.073
## 36 -6437.272
## 37 -6438.267
## 38 -6438.758
## 39 -6438.896
## 40 -6438.904
summary(houses_model1_step)##
## Call:
## lm(formula = SalePrice_log ~ `3SsnPorch` + BsmtExposure_ + BsmtFinSF1 +
## BsmtFinSF2 + BsmtFullBath + BsmtUnfSF + CentralAir_ + Condition1 +
## Condition2 + Electrical + EnclosedPorch + Exterior1st + Fireplaces +
## Foundation + FullBath + Functional + GarageArea + GarageCars +
## GarageCond_ + GarageQual_ + GarageType_ + GrLivArea + HalfBath +
## Heating + HeatingQC_ + KitchenAbvGr + KitchenQual_ + LandSlope +
## LotArea + LotConfig + LotFrontage_2 + MasVnrType_ + MSSubClass_ +
## MSZoning + Neighborhood + OverallCond_ + OverallQual_ + PoolArea +
## PoolQC_ + RoofMatl + RoofStyle + SaleCondition + SaleType +
## ScreenPorch + Street + Utilities + WoodDeckSF + YearBuilt +
## YearRemodAdd + FenceDummy_ + LowQualFinSF + ExterCond_, data = ds2_all_train[,
## -c(1, 2)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.71481 -0.04430 0.00000 0.05169 0.71481
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5939901752 0.8505527102 1.874 0.061155
## `3SsnPorch` 0.0001702361 0.0000986421 1.726 0.084631
## BsmtExposure_.L 0.0360571008 0.0262087980 1.376 0.169142
## BsmtExposure_.Q 0.0178964773 0.0213813806 0.837 0.402746
## BsmtExposure_.C 0.0088692989 0.0133846662 0.663 0.507679
## BsmtExposure_^4 0.0055779590 0.0096623689 0.577 0.563850
## BsmtFinSF1 0.0001550841 0.0000186488 8.316 0.000000000000000235
## BsmtFinSF2 0.0001287205 0.0000245961 5.233 0.000000195073367119
## BsmtFullBath 0.0259546275 0.0079501827 3.265 0.001126
## BsmtUnfSF 0.0000873422 0.0000171695 5.087 0.000000419266057225
## CentralAir_Yes 0.0562325229 0.0168047792 3.346 0.000844
## Condition1Feedr 0.0401390992 0.0220354363 1.822 0.068759
## Condition1Norm 0.0857642955 0.0184264541 4.654 0.000003595519072802
## Condition1PosA 0.0515352869 0.0439986750 1.171 0.241705
## Condition1PosN 0.0856761768 0.0325265509 2.634 0.008542
## Condition1RRAe -0.0294034047 0.0404187177 -0.727 0.467074
## Condition1RRAn 0.0567171792 0.0302220454 1.877 0.060795
## Condition1RRNe 0.0104477439 0.0778754655 0.134 0.893298
## Condition1RRNn 0.0882010720 0.0552840961 1.595 0.110872
## Condition2Feedr 0.0663088473 0.1032794893 0.642 0.520970
## Condition2Norm 0.0541635870 0.0886183470 0.611 0.541178
## Condition2PosA 0.2793525829 0.1632918100 1.711 0.087374
## Condition2PosN -0.8084067809 0.1239798912 -6.520 0.000000000101508786
## Condition2RRAe -0.5192829271 0.2039285133 -2.546 0.011003
## Condition2RRAn -0.0595534228 0.1399725127 -0.425 0.670571
## Condition2RRNn 0.0147132963 0.1184915689 0.124 0.901199
## ElectricalFuseF 0.0069076109 0.0255303731 0.271 0.786771
## ElectricalFuseP -0.0962344868 0.0768907670 -1.252 0.210959
## ElectricalMix -0.2178673962 0.1677568638 -1.299 0.194283
## ElectricalSBrkr -0.0296079760 0.0130852418 -2.263 0.023825
## EnclosedPorch 0.0001211373 0.0000550646 2.200 0.027996
## Exterior1stAsphShn 0.0594209412 0.1135780928 0.523 0.600947
## Exterior1stBrkComm -0.1724913735 0.0921228781 -1.872 0.061384
## Exterior1stBrkFace 0.0812902414 0.0335430626 2.423 0.015515
## Exterior1stCBlock -0.0909096805 0.1106933886 -0.821 0.411647
## Exterior1stCemntBd 0.0433312336 0.0354469219 1.222 0.221777
## Exterior1stHdBoard 0.0054102264 0.0311657088 0.174 0.862212
## Exterior1stImStucc 0.0205899743 0.1093050342 0.188 0.850616
## Exterior1stMetalSd 0.0363806981 0.0304762157 1.194 0.232806
## Exterior1stPlywood 0.0079312377 0.0325583744 0.244 0.807580
## Exterior1stStone 0.0387793983 0.0886249181 0.438 0.661775
## Exterior1stStucco 0.0396440683 0.0382939444 1.035 0.300749
## Exterior1stVinylSd 0.0282002659 0.0306399156 0.920 0.357554
## Exterior1stWd Sdng 0.0077095913 0.0302968609 0.254 0.799176
## Exterior1stWdShing 0.0231456227 0.0367689430 0.629 0.529144
## Fireplaces 0.0251342455 0.0058429127 4.302 0.000018273836929471
## FoundationCBlock -0.0003759814 0.0140265191 -0.027 0.978620
## FoundationPConc 0.0265828095 0.0153474844 1.732 0.083509
## FoundationSlab -0.0006266788 0.0438561618 -0.014 0.988601
## FoundationStone 0.0882249823 0.0487181139 1.811 0.070392
## FoundationWood -0.1480473318 0.0653095390 -2.267 0.023569
## FullBath 0.0297405779 0.0095691024 3.108 0.001926
## FunctionalMaj2 -0.2614801931 0.0641778918 -4.074 0.000049062098673843
## FunctionalMin1 -0.0022479944 0.0386133151 -0.058 0.953584
## FunctionalMin2 -0.0254866644 0.0388651647 -0.656 0.512092
## FunctionalMod -0.1105689899 0.0461520562 -2.396 0.016733
## FunctionalSev -0.2767224374 0.1278759345 -2.164 0.030654
## FunctionalTyp 0.0220029869 0.0342871656 0.642 0.521168
## GarageArea 0.0001144000 0.0000334845 3.417 0.000655
## GarageCars 0.0196957828 0.0100996224 1.950 0.051382
## GarageCond_.L 0.0678343947 0.3505109246 0.194 0.846575
## GarageCond_.Q -0.3867549131 0.3382736315 -1.143 0.253125
## GarageCond_.C 0.1296990494 0.2024959423 0.641 0.521964
## GarageCond_^4 -0.2087532238 0.1152065349 -1.812 0.070228
## GarageCond_^5 0.0770936152 0.0353222106 2.183 0.029252
## GarageQual_.L -0.0306606142 0.3743794837 -0.082 0.934742
## GarageQual_.Q 0.4219833971 0.3306349084 1.276 0.202093
## GarageQual_.C -0.0712230889 0.1594069675 -0.447 0.655097
## GarageQual_^4 0.1220914437 0.0848839862 1.438 0.150590
## GarageQual_^5 NA NA NA NA
## GarageType_2Types -0.1473163207 0.0485613253 -3.034 0.002466
## GarageType_Attchd 0.0034001395 0.0098296822 0.346 0.729472
## GarageType_Basment 0.0042347403 0.0299485052 0.141 0.887576
## GarageType_BuiltIn -0.0084976154 0.0168305688 -0.505 0.613724
## GarageType_CarPort -0.0109997689 0.0422427947 -0.260 0.794603
## GarageType_Detchd NA NA NA NA
## GrLivArea 0.0002326115 0.0000156904 14.825 < 0.0000000000000002
## HalfBath 0.0310538257 0.0091381306 3.398 0.000700
## HeatingGasA 0.1530122251 0.1166363901 1.312 0.189804
## HeatingGasW 0.2081371498 0.1200491517 1.734 0.083206
## HeatingGrav -0.0382546636 0.1263054197 -0.303 0.762036
## HeatingOthW 0.1563865734 0.1408509942 1.110 0.267084
## HeatingWall 0.1682731182 0.1323533944 1.271 0.203826
## HeatingQC_.L 0.0672561636 0.0752402239 0.894 0.371555
## HeatingQC_.Q -0.0312863218 0.0636278020 -0.492 0.623012
## HeatingQC_.C 0.0356583328 0.0395634082 0.901 0.367605
## HeatingQC_^4 -0.0199051373 0.0175715946 -1.133 0.257515
## KitchenAbvGr -0.0536142644 0.0248563834 -2.157 0.031199
## KitchenQual_.L 0.0275260255 0.0190288481 1.447 0.148276
## KitchenQual_.Q 0.0330279270 0.0133850279 2.468 0.013738
## KitchenQual_.C 0.0121970718 0.0073646491 1.656 0.097940
## LandSlopeMod 0.0214325640 0.0156802388 1.367 0.171918
## LandSlopeSev -0.2149337013 0.0497921933 -4.317 0.000017096444735594
## LotArea 0.0000023320 0.0000004784 4.874 0.000001231314500616
## LotConfigCulDSac 0.0308504766 0.0141473336 2.181 0.029395
## LotConfigFR2 -0.0262207133 0.0175747281 -1.492 0.135963
## LotConfigFR3 -0.0708502987 0.0555036949 -1.276 0.202017
## LotConfigInside -0.0138316620 0.0078668466 -1.758 0.078955
## LotFrontage_2 0.0004710405 0.0001943370 2.424 0.015499
## MasVnrType_BrkCmn -0.0384680288 0.0293104678 -1.312 0.189615
## MasVnrType_BrkFace 0.0152144561 0.0080042494 1.901 0.057559
## MasVnrType_Stone 0.0175956814 0.0130390453 1.349 0.177434
## MSSubClass_30 -0.0629128298 0.0207457890 -3.033 0.002475
## MSSubClass_40 -0.0779182228 0.0643329787 -1.211 0.226059
## MSSubClass_45 -0.0084977794 0.0360908788 -0.235 0.813894
## MSSubClass_50 0.0064697005 0.0173381299 0.373 0.709101
## MSSubClass_60 -0.0142774631 0.0157975612 -0.904 0.366288
## MSSubClass_70 0.0171098742 0.0248556554 0.688 0.491348
## MSSubClass_75 -0.0124097477 0.0376824259 -0.329 0.741966
## MSSubClass_80 -0.0024194686 0.0175374099 -0.138 0.890294
## MSSubClass_85 -0.0083286644 0.0255520663 -0.326 0.744518
## MSSubClass_90 -0.0207801960 0.0282754640 -0.735 0.462526
## MSSubClass_120 -0.0337612584 0.0176946050 -1.908 0.056621
## MSSubClass_160 -0.1207651163 0.0266628197 -4.529 0.000006482500190032
## MSSubClass_180 -0.0491892341 0.0464786924 -1.058 0.290115
## MSSubClass_190 -0.0060458822 0.0290128639 -0.208 0.834961
## MSZoningFV 0.4780987817 0.0536430388 8.913 < 0.0000000000000002
## MSZoningRH 0.4391174056 0.0530905191 8.271 0.000000000000000337
## MSZoningRL 0.4373475809 0.0454806318 9.616 < 0.0000000000000002
## MSZoningRM 0.3814519058 0.0425375972 8.967 < 0.0000000000000002
## NeighborhoodBlueste 0.0445116275 0.0857232939 0.519 0.603680
## NeighborhoodBrDale -0.0037230421 0.0495338732 -0.075 0.940098
## NeighborhoodBrkSide 0.0488352892 0.0409475155 1.193 0.233240
## NeighborhoodClearCr 0.0294752067 0.0391437679 0.753 0.451593
## NeighborhoodCollgCr -0.0113941116 0.0311035473 -0.366 0.714182
## NeighborhoodCrawfor 0.1187842789 0.0370216306 3.209 0.001368
## NeighborhoodEdwards -0.0694773296 0.0350207222 -1.984 0.047487
## NeighborhoodGilbert -0.0111662987 0.0333522834 -0.335 0.737833
## NeighborhoodIDOTRR 0.0057583280 0.0469397919 0.123 0.902384
## NeighborhoodMeadowV -0.1273509577 0.0529941105 -2.403 0.016401
## NeighborhoodMitchel -0.0437587740 0.0352078258 -1.243 0.214149
## NeighborhoodNAmes -0.0271832809 0.0336423591 -0.808 0.419240
## NeighborhoodNoRidge 0.0376324454 0.0355607962 1.058 0.290143
## NeighborhoodNPkVill 0.0387884483 0.0490504292 0.791 0.429218
## NeighborhoodNridgHt 0.0771237095 0.0317876734 2.426 0.015398
## NeighborhoodNWAmes -0.0258621672 0.0346819281 -0.746 0.455992
## NeighborhoodOldTown -0.0093279540 0.0413113648 -0.226 0.821397
## NeighborhoodSawyer -0.0179160176 0.0351466632 -0.510 0.610317
## NeighborhoodSawyerW -0.0016744072 0.0339786062 -0.049 0.960705
## NeighborhoodSomerst 0.0319077290 0.0393629818 0.811 0.417748
## NeighborhoodStoneBr 0.1148015258 0.0359860078 3.190 0.001457
## NeighborhoodSWISU 0.0069495298 0.0421297550 0.165 0.869006
## NeighborhoodTimber 0.0120139149 0.0352849246 0.340 0.733550
## NeighborhoodVeenker 0.0358974241 0.0457577146 0.785 0.432889
## OverallCond_.L -0.1319417955 0.0951550507 -1.387 0.165812
## OverallCond_.Q 0.3824512932 0.0911940039 4.194 0.000029365398310457
## OverallCond_.C -0.2787829500 0.0767179559 -3.634 0.000291
## OverallCond_^4 0.1358677986 0.0626837302 2.168 0.030384
## OverallCond_^5 -0.0090925339 0.0490347224 -0.185 0.852921
## OverallCond_^6 -0.0236868515 0.0351332901 -0.674 0.500309
## OverallCond_^7 0.0380427988 0.0225970994 1.684 0.092523
## OverallCond_^8 -0.0140791770 0.0126078844 -1.117 0.264339
## OverallQual_.L 0.6402649771 0.0759114784 8.434 < 0.0000000000000002
## OverallQual_.Q -0.2393413576 0.0676010883 -3.540 0.000414
## OverallQual_.C 0.1946001421 0.0572585586 3.399 0.000699
## OverallQual_^4 -0.1399556146 0.0504434429 -2.775 0.005611
## OverallQual_^5 0.0287037317 0.0446591299 0.643 0.520518
## OverallQual_^6 -0.0313080904 0.0362565140 -0.864 0.388019
## OverallQual_^7 -0.0036557045 0.0263099231 -0.139 0.889514
## OverallQual_^8 0.0014004794 0.0165260612 0.085 0.932479
## OverallQual_^9 -0.0028880407 0.0087529552 -0.330 0.741493
## PoolArea 0.0013123476 0.0007527593 1.743 0.081513
## PoolQC_.L -0.3991693831 0.2868619650 -1.392 0.164321
## PoolQC_.Q 0.3592532661 0.2691699291 1.335 0.182227
## PoolQC_.C -0.2965629133 0.0995802153 -2.978 0.002956
## RoofMatlCompShg 2.6895334574 0.2017841282 13.329 < 0.0000000000000002
## RoofMatlMembran 3.0848381418 0.2504240198 12.318 < 0.0000000000000002
## RoofMatlMetal 2.9039006924 0.2501389062 11.609 < 0.0000000000000002
## RoofMatlRoll 2.7307246681 0.2304850908 11.848 < 0.0000000000000002
## RoofMatlTar&Grv 2.6904741834 0.2183561179 12.321 < 0.0000000000000002
## RoofMatlWdShake 2.6359194140 0.2134980999 12.346 < 0.0000000000000002
## RoofMatlWdShngl 2.7447125556 0.2055421217 13.354 < 0.0000000000000002
## RoofStyleGable -0.0096175087 0.0802063945 -0.120 0.904574
## RoofStyleGambrel -0.0159178401 0.0877672585 -0.181 0.856111
## RoofStyleHip -0.0080760950 0.0804007340 -0.100 0.920005
## RoofStyleMansard 0.0346144904 0.0919610809 0.376 0.706681
## RoofStyleShed 0.4638968258 0.1634599236 2.838 0.004613
## SaleConditionAdjLand 0.1168684962 0.0628761687 1.859 0.063303
## SaleConditionAlloca 0.0554467172 0.0397875744 1.394 0.163696
## SaleConditionFamily 0.0258534029 0.0269496761 0.959 0.337582
## SaleConditionNormal 0.0653567677 0.0128309290 5.094 0.000000405159427511
## SaleConditionPartial 0.0007897260 0.0660326898 0.012 0.990460
## SaleTypeCon 0.0613609367 0.0778190582 0.789 0.430549
## SaleTypeConLD 0.1087649318 0.0434547416 2.503 0.012443
## SaleTypeConLI -0.0146219111 0.0511111884 -0.286 0.774864
## SaleTypeConLw 0.0395145254 0.0539319975 0.733 0.463895
## SaleTypeCWD 0.0537718839 0.0570825462 0.942 0.346374
## SaleTypeNew 0.0968617403 0.0685646626 1.413 0.157991
## SaleTypeOth 0.0766193354 0.0644892139 1.188 0.235022
## SaleTypeWD -0.0146968386 0.0185182055 -0.794 0.427554
## ScreenPorch 0.0002681291 0.0000549942 4.876 0.000001224333255908
## StreetPave 0.0884762183 0.0508661268 1.739 0.082212
## UtilitiesNoSeWa -0.1689408194 0.1128014532 -1.498 0.134468
## WoodDeckSF 0.0000806656 0.0000258081 3.126 0.001815
## YearBuilt 0.0019809563 0.0003460027 5.725 0.000000012922442925
## YearRemodAdd 0.0008693564 0.0002396610 3.627 0.000298
## FenceDummy_ -0.0141490258 0.0081421025 -1.738 0.082499
## LowQualFinSF -0.0001102286 0.0000715018 -1.542 0.123419
## ExterCond_.L 0.1635372825 0.1005723732 1.626 0.104188
## ExterCond_.Q -0.0672374281 0.0828442627 -0.812 0.417169
## ExterCond_.C 0.0706441893 0.0513250663 1.376 0.168942
## ExterCond_^4 0.0057785861 0.0216336964 0.267 0.789428
##
## (Intercept) .
## `3SsnPorch` .
## BsmtExposure_.L
## BsmtExposure_.Q
## BsmtExposure_.C
## BsmtExposure_^4
## BsmtFinSF1 ***
## BsmtFinSF2 ***
## BsmtFullBath **
## BsmtUnfSF ***
## CentralAir_Yes ***
## Condition1Feedr .
## Condition1Norm ***
## Condition1PosA
## Condition1PosN **
## Condition1RRAe
## Condition1RRAn .
## Condition1RRNe
## Condition1RRNn
## Condition2Feedr
## Condition2Norm
## Condition2PosA .
## Condition2PosN ***
## Condition2RRAe *
## Condition2RRAn
## Condition2RRNn
## ElectricalFuseF
## ElectricalFuseP
## ElectricalMix
## ElectricalSBrkr *
## EnclosedPorch *
## Exterior1stAsphShn
## Exterior1stBrkComm .
## Exterior1stBrkFace *
## Exterior1stCBlock
## Exterior1stCemntBd
## Exterior1stHdBoard
## Exterior1stImStucc
## Exterior1stMetalSd
## Exterior1stPlywood
## Exterior1stStone
## Exterior1stStucco
## Exterior1stVinylSd
## Exterior1stWd Sdng
## Exterior1stWdShing
## Fireplaces ***
## FoundationCBlock
## FoundationPConc .
## FoundationSlab
## FoundationStone .
## FoundationWood *
## FullBath **
## FunctionalMaj2 ***
## FunctionalMin1
## FunctionalMin2
## FunctionalMod *
## FunctionalSev *
## FunctionalTyp
## GarageArea ***
## GarageCars .
## GarageCond_.L
## GarageCond_.Q
## GarageCond_.C
## GarageCond_^4 .
## GarageCond_^5 *
## GarageQual_.L
## GarageQual_.Q
## GarageQual_.C
## GarageQual_^4
## GarageQual_^5
## GarageType_2Types **
## GarageType_Attchd
## GarageType_Basment
## GarageType_BuiltIn
## GarageType_CarPort
## GarageType_Detchd
## GrLivArea ***
## HalfBath ***
## HeatingGasA
## HeatingGasW .
## HeatingGrav
## HeatingOthW
## HeatingWall
## HeatingQC_.L
## HeatingQC_.Q
## HeatingQC_.C
## HeatingQC_^4
## KitchenAbvGr *
## KitchenQual_.L
## KitchenQual_.Q *
## KitchenQual_.C .
## LandSlopeMod
## LandSlopeSev ***
## LotArea ***
## LotConfigCulDSac *
## LotConfigFR2
## LotConfigFR3
## LotConfigInside .
## LotFrontage_2 *
## MasVnrType_BrkCmn
## MasVnrType_BrkFace .
## MasVnrType_Stone
## MSSubClass_30 **
## MSSubClass_40
## MSSubClass_45
## MSSubClass_50
## MSSubClass_60
## MSSubClass_70
## MSSubClass_75
## MSSubClass_80
## MSSubClass_85
## MSSubClass_90
## MSSubClass_120 .
## MSSubClass_160 ***
## MSSubClass_180
## MSSubClass_190
## MSZoningFV ***
## MSZoningRH ***
## MSZoningRL ***
## MSZoningRM ***
## NeighborhoodBlueste
## NeighborhoodBrDale
## NeighborhoodBrkSide
## NeighborhoodClearCr
## NeighborhoodCollgCr
## NeighborhoodCrawfor **
## NeighborhoodEdwards *
## NeighborhoodGilbert
## NeighborhoodIDOTRR
## NeighborhoodMeadowV *
## NeighborhoodMitchel
## NeighborhoodNAmes
## NeighborhoodNoRidge
## NeighborhoodNPkVill
## NeighborhoodNridgHt *
## NeighborhoodNWAmes
## NeighborhoodOldTown
## NeighborhoodSawyer
## NeighborhoodSawyerW
## NeighborhoodSomerst
## NeighborhoodStoneBr **
## NeighborhoodSWISU
## NeighborhoodTimber
## NeighborhoodVeenker
## OverallCond_.L
## OverallCond_.Q ***
## OverallCond_.C ***
## OverallCond_^4 *
## OverallCond_^5
## OverallCond_^6
## OverallCond_^7 .
## OverallCond_^8
## OverallQual_.L ***
## OverallQual_.Q ***
## OverallQual_.C ***
## OverallQual_^4 **
## OverallQual_^5
## OverallQual_^6
## OverallQual_^7
## OverallQual_^8
## OverallQual_^9
## PoolArea .
## PoolQC_.L
## PoolQC_.Q
## PoolQC_.C **
## RoofMatlCompShg ***
## RoofMatlMembran ***
## RoofMatlMetal ***
## RoofMatlRoll ***
## RoofMatlTar&Grv ***
## RoofMatlWdShake ***
## RoofMatlWdShngl ***
## RoofStyleGable
## RoofStyleGambrel
## RoofStyleHip
## RoofStyleMansard
## RoofStyleShed **
## SaleConditionAdjLand .
## SaleConditionAlloca
## SaleConditionFamily
## SaleConditionNormal ***
## SaleConditionPartial
## SaleTypeCon
## SaleTypeConLD *
## SaleTypeConLI
## SaleTypeConLw
## SaleTypeCWD
## SaleTypeNew
## SaleTypeOth
## SaleTypeWD
## ScreenPorch ***
## StreetPave .
## UtilitiesNoSeWa
## WoodDeckSF **
## YearBuilt ***
## YearRemodAdd ***
## FenceDummy_ .
## LowQualFinSF
## ExterCond_.L
## ExterCond_.Q
## ExterCond_.C
## ExterCond_^4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1017 on 1249 degrees of freedom
## Multiple R-squared: 0.9441, Adjusted R-squared: 0.9352
## F-statistic: 106 on 199 and 1249 DF, p-value: < 0.00000000000000022
O modelo Stepwise fornece um R² de 93.5 %, o seja, atinge um número
similar de ajustamento com menos variáveis.
autoplot(houses_model1_step)Modelo 3
##Model 3
houses_model3 <- lm(SalePrice_log ~ OverallQual_ + GrLivArea + GarageArea + TotalBsmtSF + YearBuilt ,
data = ds2_all_train)
summary(houses_model3)##
## Call:
## lm(formula = SalePrice_log ~ OverallQual_ + GrLivArea + GarageArea +
## TotalBsmtSF + YearBuilt, data = ds2_all_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.21210 -0.07304 0.00956 0.08947 0.61658
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.13206662 0.39875118 15.378 < 0.0000000000000002 ***
## OverallQual_.L 1.08854209 0.08410090 12.943 < 0.0000000000000002 ***
## OverallQual_.Q -0.21963517 0.07058379 -3.112 0.0019 **
## OverallQual_.C -0.09582500 0.06330631 -1.514 0.1303
## OverallQual_^4 -0.02059410 0.06251442 -0.329 0.7419
## OverallQual_^5 -0.19411898 0.05934152 -3.271 0.0011 **
## OverallQual_^6 0.00787750 0.04984673 0.158 0.8745
## OverallQual_^7 -0.05997238 0.03632255 -1.651 0.0989 .
## OverallQual_^8 0.01007919 0.02311834 0.436 0.6629
## OverallQual_^9 -0.01790371 0.01285412 -1.393 0.1639
## GrLivArea 0.00025711 0.00001180 21.785 < 0.0000000000000002 ***
## GarageArea 0.00025470 0.00002816 9.045 < 0.0000000000000002 ***
## TotalBsmtSF 0.00011249 0.00001355 8.304 0.000000000000000229 ***
## YearBuilt 0.00260631 0.00020270 12.858 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.173 on 1435 degrees of freedom
## Multiple R-squared: 0.8142, Adjusted R-squared: 0.8125
## F-statistic: 483.6 on 13 and 1435 DF, p-value: < 0.00000000000000022
autoplot(houses_model3)Modelo 4
##Model 4
houses_model4 <- lm(SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF + YearBuilt ,
data = ds2_all_train)
summary(houses_model4)##
## Call:
## lm(formula = SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF +
## YearBuilt, data = ds2_all_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.82184 -0.07944 0.00853 0.09530 0.66933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.42626133 0.39315138 6.171 0.000000000878 ***
## GrLivArea 0.00035472 0.00001186 29.913 < 0.0000000000000002 ***
## GarageArea 0.00034221 0.00003172 10.788 < 0.0000000000000002 ***
## TotalBsmtSF 0.00016446 0.00001482 11.099 < 0.0000000000000002 ***
## YearBuilt 0.00442584 0.00020298 21.804 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.199 on 1444 degrees of freedom
## Multiple R-squared: 0.7525, Adjusted R-squared: 0.7518
## F-statistic: 1097 on 4 and 1444 DF, p-value: < 0.00000000000000022
autoplot(houses_model4)Modelo 5
##Model 5
houses_model5 <- lm(SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF + YearBuilt + SaleCondition,
data = ds2_all_train)
summary(houses_model5)##
## Call:
## lm(formula = SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF +
## YearBuilt + SaleCondition, data = ds2_all_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.84368 -0.08098 0.00775 0.09809 0.65373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.71925368 0.39685379 6.852 0.0000000000108 ***
## GrLivArea 0.00035463 0.00001167 30.381 < 0.0000000000000002 ***
## GarageArea 0.00032355 0.00003149 10.276 < 0.0000000000000002 ***
## TotalBsmtSF 0.00016397 0.00001470 11.151 < 0.0000000000000002 ***
## YearBuilt 0.00421979 0.00020497 20.588 < 0.0000000000000002 ***
## SaleConditionAdjLand 0.09175959 0.10012749 0.916 0.360
## SaleConditionAlloca 0.01625644 0.06242337 0.260 0.795
## SaleConditionFamily 0.01669116 0.04790987 0.348 0.728
## SaleConditionNormal 0.13074958 0.02038860 6.413 0.0000000001932 ***
## SaleConditionPartial 0.17255486 0.02795272 6.173 0.0000000008692 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1956 on 1439 degrees of freedom
## Multiple R-squared: 0.7617, Adjusted R-squared: 0.7602
## F-statistic: 511.1 on 9 and 1439 DF, p-value: < 0.00000000000000022
autoplot(houses_model5)Modelo 6
##Model 6
houses_model6 <- lm(SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF + YearBuilt + MSSubClass_,
data = ds2_all_train)
summary(houses_model6)##
## Call:
## lm(formula = SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF +
## YearBuilt + MSSubClass_, data = ds2_all_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.62169 -0.08364 0.00393 0.09431 0.65494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.36389353 0.61597915 0.591 0.554777
## GrLivArea 0.00035922 0.00001718 20.915 < 0.0000000000000002 ***
## GarageArea 0.00030443 0.00003040 10.014 < 0.0000000000000002 ***
## TotalBsmtSF 0.00012486 0.00001873 6.668 0.00000000003690232 ***
## YearBuilt 0.00550775 0.00031502 17.484 < 0.0000000000000002 ***
## MSSubClass_30 -0.05177832 0.02813617 -1.840 0.065935 .
## MSSubClass_40 0.07563853 0.09469935 0.799 0.424584
## MSSubClass_45 0.06002185 0.05624106 1.067 0.286051
## MSSubClass_50 0.01802657 0.02268266 0.795 0.426903
## MSSubClass_60 -0.04013343 0.02088366 -1.922 0.054834 .
## MSSubClass_70 0.17070546 0.03289483 5.189 0.00000024128523375 ***
## MSSubClass_75 0.09443019 0.05450907 1.732 0.083422 .
## MSSubClass_80 0.02283255 0.02690850 0.849 0.396287
## MSSubClass_85 0.01245897 0.04309661 0.289 0.772551
## MSSubClass_90 -0.23023837 0.02857599 -8.057 0.00000000000000163 ***
## MSSubClass_120 0.01023796 0.02278330 0.449 0.653238
## MSSubClass_160 -0.20077215 0.02784303 -7.211 0.00000000000089784 ***
## MSSubClass_180 -0.20876303 0.06083173 -3.432 0.000617 ***
## MSSubClass_190 -0.02665014 0.03880041 -0.687 0.492287
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1878 on 1430 degrees of freedom
## Multiple R-squared: 0.7819, Adjusted R-squared: 0.7791
## F-statistic: 284.8 on 18 and 1430 DF, p-value: < 0.00000000000000022
autoplot(houses_model6)Cross-Validation
Agora que já temos uma idéa de quais variáveis deverão ser incluídas
no modelo, vamos fazer o cross validation.
# Model 1: all variables
set.seed(123)
cv_model1 <- train(
SalePrice_log ~ .,
data = ds2_all_train[,-c(1,2)],
method = "lm",
trControl = trainControl(method = "cv", number = 10)
)
# Model 2: Stepwise selected
set.seed(123)
cv_model2 <- train(
SalePrice_log ~ `3SsnPorch` + BsmtExposure_ + BsmtFinSF1 + BsmtFinSF2 +
BsmtFullBath + BsmtUnfSF + CentralAir_ + Condition1 + Condition2 +
Electrical + EnclosedPorch + Exterior1st + Fireplaces + Foundation +
FullBath + Functional + GarageArea + GarageCars + GarageCond_ +
GarageQual_ + GarageType_ + GrLivArea + HalfBath + Heating +
HeatingQC_ + KitchenAbvGr + KitchenQual_ + LandSlope + LotArea +
LotConfig + LotFrontage_2 + MasVnrType_ + MSSubClass_ + MSZoning +
Neighborhood + OverallCond_ + OverallQual_ + PoolArea + PoolQC_ +
RoofMatl + RoofStyle + SaleCondition + SaleType + ScreenPorch +
Street + Utilities + WoodDeckSF + YearBuilt + YearRemodAdd +
FenceDummy_ + LowQualFinSF + ExterCond_ ,
data = ds2_all_train,
method = "lm",
trControl = trainControl(method = "cv", number = 10)
)
# Model 3:
set.seed(123)
cv_model3 <- train(
SalePrice_log ~ OverallQual_ + GrLivArea + GarageArea + TotalBsmtSF + YearBuilt,
data = ds2_all_train,
method = "lm",
trControl = trainControl(method = "cv", number = 10)
)
# Model 4:
set.seed(123)
cv_model4 <- train(
SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF + YearBuilt,
data = ds2_all_train,
method = "lm",
trControl = trainControl(method = "cv", number = 10)
)
# Model 5:
set.seed(123)
cv_model5 <- train(
SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF + YearBuilt + SaleCondition,
data = ds2_all_train,
method = "lm",
trControl = trainControl(method = "cv", number = 10)
)
# Model 6:
set.seed(123)
cv_model6 <- train(
SalePrice_log ~ GrLivArea + GarageArea + TotalBsmtSF + YearBuilt + MSSubClass_,
data = ds2_all_train,
method = "lm",
trControl = trainControl(method = "cv", number = 10)
)# extract out of sample performance measures
rbind(
modelo_1 = cv_model1$results,
modelo_stepwise = cv_model2$results,
modelo_3 = cv_model3$results,
modelo_4 = cv_model4$results,
modelo_5 = cv_model5$results,
modelo_6 = cv_model6$results
) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD | |
|---|---|---|---|---|---|---|---|
| modelo_1 | TRUE | 0.1984153 | 0.7792703 | 0.0989773 | 0.0827491 | 0.1320839 | 0.0125427 |
| modelo_stepwise | TRUE | 0.1942762 | 0.7846206 | 0.0934100 | 0.0894963 | 0.1439271 | 0.0143781 |
| modelo_3 | TRUE | 0.1751786 | 0.8094793 | 0.1178691 | 0.0358884 | 0.0747034 | 0.0109550 |
| modelo_4 | TRUE | 0.1989073 | 0.7581153 | 0.1313164 | 0.0453922 | 0.1012749 | 0.0110731 |
| modelo_5 | TRUE | 0.1962135 | 0.7645059 | 0.1299365 | 0.0448691 | 0.0995772 | 0.0098148 |
| modelo_6 | TRUE | 0.1892963 | 0.7808777 | 0.1251320 | 0.0423780 | 0.0913277 | 0.0098892 |
Note que com o cross-validation, o R² dos modelos 1 e 2 cairam de
93.5 % e 93.5 % respectivamente, para 77.9 % e 78.5 .
Avaliação [Evaluation]
Usando o quadro acima, decidimos por usar o modelo 6 para prever a
base de teste.
O modelo 6 nos dá um ajuste de 78.1 %. Analisando a tablea abaixo temos
que
summary(cv_model6)##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.62169 -0.08364 0.00393 0.09431 0.65494
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.36389353 0.61597915 0.591 0.554777
## GrLivArea 0.00035922 0.00001718 20.915 < 0.0000000000000002 ***
## GarageArea 0.00030443 0.00003040 10.014 < 0.0000000000000002 ***
## TotalBsmtSF 0.00012486 0.00001873 6.668 0.00000000003690232 ***
## YearBuilt 0.00550775 0.00031502 17.484 < 0.0000000000000002 ***
## MSSubClass_30 -0.05177832 0.02813617 -1.840 0.065935 .
## MSSubClass_40 0.07563853 0.09469935 0.799 0.424584
## MSSubClass_45 0.06002185 0.05624106 1.067 0.286051
## MSSubClass_50 0.01802657 0.02268266 0.795 0.426903
## MSSubClass_60 -0.04013343 0.02088366 -1.922 0.054834 .
## MSSubClass_70 0.17070546 0.03289483 5.189 0.00000024128523375 ***
## MSSubClass_75 0.09443019 0.05450907 1.732 0.083422 .
## MSSubClass_80 0.02283255 0.02690850 0.849 0.396287
## MSSubClass_85 0.01245897 0.04309661 0.289 0.772551
## MSSubClass_90 -0.23023837 0.02857599 -8.057 0.00000000000000163 ***
## MSSubClass_120 0.01023796 0.02278330 0.449 0.653238
## MSSubClass_150 NA NA NA NA
## MSSubClass_160 -0.20077215 0.02784303 -7.211 0.00000000000089784 ***
## MSSubClass_180 -0.20876303 0.06083173 -3.432 0.000617 ***
## MSSubClass_190 -0.02665014 0.03880041 -0.687 0.492287
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1878 on 1430 degrees of freedom
## Multiple R-squared: 0.7819, Adjusted R-squared: 0.7791
## F-statistic: 284.8 on 18 and 1430 DF, p-value: < 0.00000000000000022
Já que calculamos o modelo usando o log da variável resposta, para
interpretá-lo, temos que fazer a exponencial das variáveis.
exp(summary(cv_model6)$coefficients[,1])## (Intercept) GrLivArea GarageArea TotalBsmtSF YearBuilt
## 1.4389210 1.0003593 1.0003045 1.0001249 1.0055229
## MSSubClass_30 MSSubClass_40 MSSubClass_45 MSSubClass_50 MSSubClass_60
## 0.9495393 1.0785726 1.0618597 1.0181900 0.9606612
## MSSubClass_70 MSSubClass_75 MSSubClass_80 MSSubClass_85 MSSubClass_90
## 1.1861413 1.0990324 1.0230952 1.0125369 0.7943442
## MSSubClass_120 MSSubClass_160 MSSubClass_180 MSSubClass_190
## 1.0102905 0.8180988 0.8115875 0.9737018
Ou seja, o preço de venda das casas no modelo 3 é emplicado pelas variáveis:
- MSSubClass_
- GrLivArea
- GarageArea
- TotalBsmtSF
- YearBuilt
Interpretando algumas das variáveis, temos que para cada 1 unidade de
mudança na variável Área Total do porão (TotalBsmtSF) o preço de venda
da casa aumenta em 1.0001249 . Para o caso da variável que avalia a o
tipo de residência (MSSubClass_), como essa é uma variável qualitativa
notamos que o fator 20 (casas de 1 andar contruídas depois de 1945) está
incluido no intercepto, mas para o caso das casas de 1 andar contruidas
até 1945 (30), temos que o preço de venda sa casa aumenta em 1.0785726
.
Gráfico da Variável Predita vs Atual
ds2_all_train$SalePrice_log_pred <- predict(cv_model6, ds2_all_train)
ds2_all_train$SalePrice_pred <- exp(ds2_all_train$SalePrice_log_pred)ggplot(data=ds2_all_train, aes(x=c(1:nrow(ds2_all_train)))) +
geom_point(aes(y = SalePrice, color = "Actual"), alpha = 0.5) +
geom_point(aes(y = SalePrice_pred, color = "Predicted"), alpha = 0.5) +
#labs(color='Transported') +
ggtitle("Gráfico Variável SalePredict") +
ylab("SalePrice") +
xlab("ID") Aplicação [Deployment]
Vamos então para a parte mais esperada, fazer a predição da base Test. Utilizo o Modelo 6 para fazer a predição:
ds2_all_test$SalePrice_log_pred <- predict(cv_model6, ds2_all_test)
ds2_all_test$SalePrice_pred <- exp(ds2_all_test$SalePrice_log_pred)Faço o gráfico mostrando a variável predita com a variável resposta que neste caso veio do arquivo sample_submission:
ggplot(data=ds2_all_test, aes(x=c(1:nrow(ds2_all_test)))) +
geom_point(aes(y = SalePrice, color = "Actual"), alpha = 0.5) +
geom_point(aes(y = SalePrice_pred, color = "Predicted"), alpha = 0.5) +
#labs(color='Transported') +
ggtitle("Gráfico Variável SalePredict (base test)") +
ylab("SalePrice") +
xlab("ID") Aparentemente, tivemos alsuns valores bem fora da dos que seriam os
corretos. Vamos verificar a acurácia desse modelo comparando com as
respostas que estavam no aquivo sample_submission, lembrando
que isso não acontece no machine learning, na realidade não temos a
resposta, mas vale fazer essa verificação aqui.
MSE <- ds2_all_test %>%
mutate(error = SalePrice_pred - SalePrice,
sq.error = error^2) %>%
summarise(MSE = mean(sq.error))
RMSE <- sqrt(MSE)
table1 <- cbind(MSE, RMSE)
colnames(table1) <- c("MSE", "RMSE")
table2 <- cbind(cv_model3$results$MAE, cv_model3$results$RMSE)
colnames(table2) <- c("MSE", "RMSE")
table3 <- rbind(table1,table2)
row.names(table3) <- c("test", "train")
round(table3,2) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| MSE | RMSE | |
|---|---|---|
| test | 4830504118.29 | 69501.83 |
| train | 0.12 | 0.18 |
Fazendo essa comparação, notamos que o erro da base de teste é muito
grande, mostrando que esse modelo não tem um bom ajuste nessa base. Isso
pode ser um indicativo que a base de teste era muito diferente da base
de treino.
Por exemplo, se pegarmos o summary da variável resposta de ambas as
bases:
summary(ds2_all_train$SalePrice)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34900 129900 162000 180545 214000 755000
summary(ds2_all_test$SalePrice)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 135751 168591 179188 178952 186599 281644
Já podemos ver que a base de treino tem uma variabilidade maior, indo
de um valor pequeno até um muito grande, já a base de teste possui uma
variabilidade menor, indicando assim que outliers no modelo de treino
deveriam ter sido melhor tratado, pois podem estar causando
interferencia no modelo preditivo final.
Para se obter um melhor ajuste seria necessário voltar alguns passos e
rever toda essa modelagem. Contudo, na vida real, no machine leaning,
não temos a resposta.
Exportando o arquivo final
test_keep <- c("Id", "SalePrice_pred")
ds2_test_final_submission <- ds2_all_test[,test_keep]
head(ds2_test_final_submission) %>%
kbl %>% kable_paper(c("striped", "hover"), full_width = F, fixed_thead = T)| Id | SalePrice_pred |
|---|---|
| 1461 | 135783.6 |
| 1462 | 145277.1 |
| 1463 | 193013.7 |
| 1464 | 191595.4 |
| 1465 | 183365.9 |
| 1466 | 184321.6 |
write.csv(ds2_test_final_submission,".\\final_submission_house.csv", row.names = FALSE)